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PREFACE 


This  book  has  three  goals: 

1.  To  introduce  students  to  the  elegant  theory  that  underlies  modern  computing. 

2.  To  motivate  students  by  showing  them  that  the  theory  is  alive.  While  much  of  it 
has  been  known  since  the  early  days  of  digital  computers  (and  some  of  it  even 
longer),  the  theory  continues  to  inform  many  of  the  most  important  applications 
that  are  considered  today. 

3.  To  show  students  how  to  start  looking  for  ways  to  exploit  the  theory  in  their  own 
work. 

The  core  of  the  book,  as  a  standard  textbook,  is  Parts  1  through  V.They  address  the  first 
of  the  stated  goals.  They  contain  the  theory  that  is  being  presented.  There  is  more  ma¬ 
terial  in  them  than  can  be  covered  in  a  one-semester  course.  Sections  that  are  marked 
with  a  •  are  optional,  in  the  sense  that  later  material  does  not,  for  the  most  part,  de¬ 
pend  on  them.  The  Course  Plans  section  on  page  xv  suggests  ways  of  selecting  sections 
that  are  appropriate  for  some  typical  computer  science  courses. 

Then  there  are  seventeen  appendices: 

•  Appendix  A  reviews  the  mathematical  concepts  on  which  the  main  text  relies.  Stu¬ 
dents  should  be  encouraged  to  review  it  during  the  first  week  of  class. 

•  Appendix  B  describes  techniques  for  working  with  logical  formulas  (both  Boolean 
and  first-order). 

•  Appendices  C,  D.  E  and  F  treat  selected  theoretical  concepts  in  greater  depth.  In 
particular,  they  contain  the  details  of  some  proofs  that  are  only  sketched  in  the 
main  text. 

•  Appendices  G  through  Q  address  the  second  and  third  goals. They  describe  appli¬ 
cations  of  the  techniques  that  are  described  in  the  main  body  of  the  book. They  also 
contain  some  interesting  historical  material.  Although  they  are  long  (at  least  in 
comparison  to  the  space  that  is  devoted  to  applications  in  most  other  books  in  this 
area),  they  only  skim  the  surface  of  the  applications  that  they  present.  But  my  hope 
is  that  that  is  enough.  The  World  Wide  Web  has  completely  changed  our  ability  to 
access  knowledge.  What  matters  now  is  to  know  that  something  exists  and  thus  to 
look  for  it. The  short  discussions  that  are  presented  in  these  appendices  will,  I  hope, 
give  students  that  understanding. 

There  is  a  Web  site  that  accompanies  this  book;  http://www.lheoryandapplieations.org/. 

It  is  organized  into  the  same  sections  as  the  book,  so  that  it  is  easy  to  follow  the  two  in 

parallel.  The  symbol  Q  following  a  concept  in  the  text  means  that  additional  material  is 

available  on  the  Web  site. 
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Throughout  the  text,  you'll  find  pointers  to  the  material  in  the  appendices,  as  well  as 
to  material  on  the  book’s  Web  site.  There  are  also  some  standalone  application  notes. 
These  pointers  and  notes  are  enclosed  in  boxes,  and  refer  you  to  the  appropriate  ap¬ 
pendix  and  page  number  or  to  the  Web. The  appendix  references  look  like  this: 


This  technique  really  is  useful.  (H.  1.2.) 


Notation 

It  is  common  practice  to  write  definitions  in  the  following  form: 

A  something  is  a  special  something  if  it  possesses  properly  P. 

This  form  is  used  even  though  property  P  is  not  only  a  sufficient  but  also  a  necessary 
condition  for  being  a  special  something.  For  clarity  we  will,  in  those  cases,  write  “if  and 
only  if,  abbreviated  “iff,  instead  of ‘if’.  So  we  will  write: 

A  something  is  a  special  something  iff  it  possesses  property  P. 


Throughout  the  book 
ventions: 

we  will,  with  a  few  exceptions,  use  the  following  naming  con- 

Examples 

sets 

cupiuil  letters,  early  in  ihc  alphabet, 
plus  5 

A,  B,  C.  D,  S 

logical  formulas 

capital  letters,  middle  of  the  alphabet 

P.Q.R 

predicates  and  relations 

capital  letters,  middle  of  the  alphabet 

P,  Q.R 

logical  constants 

subscripted  X's  and  specific  names 

Xf.Xj,  John.  Smoky 

functions 

lower  case  letters  or  words 

f,  K,  convert 

integers 

lower  case  letters,  middle  of  the  alphabet 

i ,  j.  k,  1.  in,  n 

string-valued  variables 

lower  case  letters,  late  in  the  alphabet 

s,  t,  u,  v,  tv,  ,r.  y 

literal  strings 

written  in  computer  font 

abc,  aabbb 

language-valued  variables 

upper  case  letters  starling  with  L 

L,  L Li 

specific  languages 

nonitalicized  strings 

ATI".  WW 

regular  expressions 

lower  case  Greek  letters 

a.p.y 

stales 

lower  case  letters,  middle  of  the  alphabet 

p,q,r,s,t 

nonterminals  in  grammar  rules 

upper  case  letters 

A.s.c.s.r 

working  strings  in  grammatical 

tower  case  Greek  letter 

o./3,y 

derivations 

strings  representing  a  PDA's 

tower  case  Greek  tetter 

stack 

other  variables 

lower  case  letters,  late  in  the  alphabet 

Jr,  y.  z 

Programs  and  algorithms  will  appear  throughout  the  book,  staled  at  varying  levels 
of  detail.  We  will  use  the  following  formats  for  describing  them: 

■  Exact  code  in  some  particular  programming  lunguage  will  be  written  the  same  way 
other  strings  are  written. 
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•  Algorithms  that  are  described  in  pseudocode  will  be  written  as: 

Until  an  even-length  string  is  found  do: 

Generate  the  next  string  in  the  sequence. 

When  we  want  to  be  able  to  talk  about  the  steps,  they  will  be  numbered,  so  we  will 
write: 

1.  Until  an  even-length  string  is  found  do: 

1.1.  Generate  the  next  string  in  the  sequence. 

2.  Reverse  the  siring  that  was  found. 

When  comments  are  necessary,  as  for  example  in  code  or  in  grammars,  they  will  be 
preceded  by  the  string  /*. 


Course  Plans 

Appendix  A  summarizes  the  mathematical  concepts  on  which  the  rest  of  the  book  re¬ 
lies.  Depending  on  the  background  of  the  students,  it  may  be  appropriate  to  spend  one 
or  more  lectures  on  this  material.  At  the  University  of  Texas,  our  students  have  had  two 
prior  courses  in  logic  and  discrete  structures  before  they  arrive  in  my  class,  so  1  have 
found  that  it  is  sufficient  just  to  ask  the  students  to  read  Appendix  A  and  to  work  a  se¬ 
lection  of  the  exercises  that  are  provided  at  the  end  of  it. 

Part  I  lays  the  groundwork  for  the  rest  of  the  book.  Chapter  2  is  essential,  since  it 
defines  the  fundamental  structures:  strings  and  languages.  I  have  found  that  it  is  very 
useful  to  cover  Chapter  3,  which  presents  a  roadmap  for  the  rest  of  the  material.  It 
helps  students  see  where  we  are  going  and  how  each  piece  of  the  theory  fils  into  the 
overall  picture  of  a  theory  of  computation.  Chapter  4  introduces  three  ideas  that  be¬ 
come  important  later  in  the  book.  1  have  found  that  it  may  be  better  to  skip  Chapter  4 
at  the  beginning  of  my  class  and  to  return  to  each  of  its  sections  once  or  twice  later,  as 
the  concepts  arc  required. 

If  the  optional  sections  are  omitted.  Chapters  5, 6. 8. 9, 1 1-14. 17-21.  and,  optionally, 
23  and/or  24  cover  the  material  in  a  standard  course  in  Automata  Theory.  Chapter  15 
(Context-Free  Parsing)  contains  material  that  many  computer  science  students  need  to 
see  and  it  fits  well  into  an  Automata  Theory  course.  1  used  to  include  much  of  it  in  my 
class.  But  that  material  is  often  taught  in  a  course  on  Programming  Languages  or  Com¬ 
pilers.  In  that  case,  it  makes  sense  to  omit  it  from  the  Automata  Theory  course.  In  its 
place,  1  now  cover  the  optional  material  in  Chapter  5.  particularly  the  section  on  sto¬ 
chastic  finite  automata.  I  also  cover  Chapter  22.  I’ve  found  that  students  are  more 
motivated  to  tackle  the  difficult  material  (particularly  the  design  of  reduction  proofs) 
in  Chapter  21  if  they  can  see  ways  in  which  the  theory  of  undecidability  applies  to 
problems  that  are,  to  them,  more  intriguing  than  questions  about  the  behavior  of  Turing 
machines. 

This  text  is  also  appropriate  for  a  broader  course  that  includes  the  core  of  the  clas¬ 
sic  theory  of  automata  plus  the  modern  theory  of  complexity.  Such  a  course  might 
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cover  Chapters  2-3, 5,8, 11, 13, 17-21,  and  27-30,  omitting  sections  as  time  pressures 
require. 

This  text  is  unique  in  the  amount  of  space  it  devotes  to  applications  of  the  core  the¬ 
oretical  material.  In  order  to  make  the  application  discussions  coherent,  they  are  sepa¬ 
rated  from  the  main  text  and  occur  in  the  appendices  at  the  end  of  the  book.  But  I  have 
found  that  I  can  substantially  increase  student  interest  in  my  course  by  sprinkling  appli¬ 
cation  discussions  throughout  the  term.  The  application  references  that  occur  in  the 
main  text  suggest  places  where  it  makes  sense  to  do  that. 


Resources  for  Instructors 

There  is  a  website,  www.prenhall.com/rich,  that  contains  materials  that  have  been  de¬ 
signed  to  make  it  easy  to  teach  from  this  book.  In  particular.it  contains: 

•  a  complete  set  of  Powerpoint  slides, 

•  solutions  to  many  of  the  Exercises,  and 

•  additional  problems,  many  of  them  with  solutions. 

I  would  like  to  invite  instructors  who  use  this  book  to  send  me  additional  problems 
that  can  be  shared  with  other  users. 
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CHAPTER  1 


Why  Study  the  Theory 
of  Computation? 


In  this  book,  we  present  a  theory  of  what  can  be  computed  and  what  cannot.  We  also 
sketch  some  theoretical  frameworks  that  can  inform  the  design  of  programs  to  solve 
a  wide  variety  of  problems.  But  why  do  we  bother?  We  don’t  we  just  skip  ahead  and 
write  the  programs  that  we  need?  This  chapter  is  a  short  attempt  to  answer  that  question. 


1.1  The  Shelf  Life  of  Programming  Tools 

Implementations  come  and  go.  In  the  somewhat  early  days  of  computing,  program 
ming  meant  knowing  how  to  write  code  like:1 


ENTRY 

SXA 

4, RETURN 

LDQ 

X 

FMP 

A 

FAD 

B 

XCA 

FMP 

X 

FAD 

C 

STO 

RESULT 

RETURN 

TRA 

0 

A 

BSS 

1 

B 

BSS 

1 

C 

BSS 

1 

X 

BSS 

1 

TEMP 

BSS 

1 

STORE 

BSS 

1 

END 

'This  program  was  written  for  the  IRM  7090  It  computes  the  value  of  a  simple  quadratic  ax:  +  hx  <•. 
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In  1957,  Fortran  appeared  and  made  il  possible  for  people  to  write  programs  that  looked 
more  straightforwardly  like  mathematics.  By  1970,  the  IBM  360  series  of  computers  was 
in  widespread  use  for  both  business  and  scientific  computing. To  submit  a  job,  one  keyed 
onto  punch  cards  a  set  of  commands  in  OS/360  JCL  (Job  Control  Language).  Guruhood 
attached  to  people  who  actually  knew  what  something  like  this  meant:- 

//MYJOB  JOB  (COMPRESS),' VOLKER  BANDKE ' ,  CLASS=P,C0ND«(0,NE) 
//BACKUP  EXEC  PGM=IEBC0PY 
//SYSPRINT  DD  SYS0UT-* 

//SYSUT1  DD  DISP=SHR , DSN=MY . IMPORTNT . PDS 

//SYSUT2  DD  DISP=( , CATLC) , DSN-MY . IMPORTNT . PDS . BACKUP , 

//  UNIT-3350, V0L=SER=DISK01, 

//  DCB=MY . IMPORTNT . PDS , SPACE- (CYL , (10 , 10 , 20) ) 

//COMPRESS  EXEC  PCM-IEBCOPY 
//SYSPRINT  DD  SYS0UT=* 

//MYPDS  DD  DISP=OLD,DSN=*. BACKUP. SYSUT1 
//SYSIN  DD  * 

COPY  INDD-MYPDS , 0UTDD=MYPDS 
//DELETE2  EXEC  PCM=IEFBR14 

//BACKPDS  DD  DI SP= (OLD, DELETE, DELETE) ,DSN=MY. IMPORTNT. PDS. BACKUP 

By  the  turn  of  the  millennium,  gurus  were  different. They  listened  to  different  music  and 
had  never  touched  a  keypunch  machine.  But  many  of  them  did  know  that  the  following 
Java  method  (when  compiled  with  the  appropriate  libraries)  allows  the  user  to  select  a 
file,  which  is  read  in  and  parsed  using  whitespace  delimiters.  From  the  parsed  file,  the 
program  builds  a  frequency  map.  which  shows  how  often  each  word  occurs  in  the  file: 

public  static  TreeMap<String,  Integers  createO  throws  IOException 
public  static  TreeMap<String,  Integer>  createO  throws 
IOException 
{  Integer  freq; 

String  word; 

TreeMap<String,  Integers  result  -  new  TreeMap<St ring,  Integer>(); 
JFileChooser  c  ■  new  JFileChooserO ; 
int  retval  =  c.showOpenDialog(null) ; 
if  (retval  =  3FileChooser.APPR0VE_0PTI0N) 

{  Scanner  s  -  new  Scanner(  c.getSelectedFileO) ; 
while(  s.hasNextO  ) 

{  word  =  s.next() .toLowerCase() ; 
freq  =  resul t. get (word) ; 

result. put (word,  (freq  —  null  ?  1  :  freq  +  1)); 

} 

> 

return  result; 

} 

} 


2lt  safely  reorganizes  and  compresses  a  partitioned  dataset. 
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Along  the  way,  other  programming  languages  became  popular,  at  least  within  some 
circles.  There  was  a  time  when  some  people  bragged  that  they  could  write  code 
like:3 

cr/v)>(+/v)-r/v 

Today’s  programmers  can't  read  code  from  50  years  ago.  Programmers  from  the  early 
days  could  never  have  imagined  what  a  program  of  today  would  look  like.  In  the  face 
of  that  kind  of  change,  what  does  it  mean  to  learn  the  science  of  computing? 

The  answer  is  that  there  are  mathematical  properties,  both  of  problems  and  of 
algorithms  for  solving  problems,  that  depend  on  neither  the  details  of  today’s  technol¬ 
ogy  nor  the  programming  fashion  du  jour. The  theory  that  we  will  present  in  this  book 
addresses  some  of  those  properties.  Most  of  what  we  will  discuss  was  known  by  the 
early  1970s  (barely  the  middle  ages  of  computing  history).  But  it  is  still  useful  in  two 
key  ways: 

•  It  provides  a  set  of  abstract  structures  that  are  useful  lor  solving  certain  classes  of 
problems.  These  abstract  structures  can  be  implemented  on  whatever  hardware/ 
software  platform  is  available. 

•  It  defines  provable  limits  to  what  can  be  computed,  regardless  of  processor  speed 
or  memory  size.  An  understanding  of  these  limits  helps  us  to  focus  our  design  effort 
in  areas  in  which  it  can  pay  off,  rather  than  on  the  computing  equivalent  of  the 
search  for  a  perpetual  motion  machine. 

In  this  book  our  focus  will  be  on  analyzing  problems,  rather  than  on  comparing  solu¬ 
tions  to  problems.  We  will,  of  course,  spend  a  lot  of  time  solving  problems.  But  our  goal 
will  be  to  discover  fundamental  properties  of  the  problems  themselves: 

•  Is  there  any  computational  solution  to  the  problem?  If  not.  is  there  a  restricted  but 
useful  variation  of  the  problem  for  which  a  solution  does  exist? 

•  If  a  solution  exists,  can  it  be  implemented  using  some  fixed  amount  of  memory? 

•  If  a  solution  exists,  how  efficient  is  it?  More  specifically,  how  do  its  time  and  space 
requirements  grow  as  the  size  of  the  problem  grows? 

•  Are  there  groups  of  problems  that  are  equivalent  in  the  sense  that  if  there  is  an  ef¬ 
ficient  solution  to  one  member  of  the  group  there  is  an  efficient  solution  to  all  the 
others? 


'An  expression  in  the  programming  language  API.  Q.  It  returns  I  if  the  largest  value  in  a  three  element  vec¬ 
tor  is  greater  than  the  sum  of  the  other  two  elements,  and  0  otherwise  |Gillman  and  Rose  1VS4.  p.  32h|.  Al¬ 
though  APL  is  not  one  of  the  major  programming  languages  in  use  today,  its  inventor.  Kenneth  Iverson, 
received  the  1970  Turing  Award  for  its  development. 
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1 .2  Applications  of  the  Theory  Are  Everywhere 

Computers  have  revolutionized  our  world.  They  have  changed  the  course  of  our  daily 
lives,  the  way  we  do  science,  the  way  we  entertain  ourselves,  the  way  that  business  is 
conducted,  and  the  way  we  protect  our  security.  The  theory  that  we  present  in  this 
book  has  applications  in  all  of  those  areas.  Throughout  the  main  text,  you  will  find 
notes  that  point  to  the  more  substantive  application-focused  discussions  that  appear  in 
Appendices  G-Q.  Some  of  the  applications  that  we'll  consider  are: 

•  Languages,  the  focus  of  this  book,  enable  both  machine/machine  and  person/ma¬ 
chine  communication.  Without  them,  none  of  today's  applications  of  computing 
could  exist. 


Network  communication  protocols  are  languages.  (1. 1 )  Most  web  pages  are 
described  using  the  Hypertext  Markup  Language.  HTML.  (Q.1.2)The  Se¬ 
mantic  Web,  whose  goal  is  to  support  intelligent  agents  working  on  the  Web, 
exploits  additional  layers  of  languages,  such  as  RDF  and  OWL.  that  can  be 
used  to  describe  the  content  of  the  Web.  (1. 3)  Music  can  be  viewed  as  a  lan¬ 
guage,  and  specialized  languages  enable  composers  to  create  new  electronic 
music.  (N.l)  Even  very  unlanguage-likc  things,  such  as  sets  of  pictures,  can 
be  viewed  as  languages  by.  for  example,  associating  each  picture  with  the 
program  that  drew  it.  (Q.1.3) 


•  Both  the  design  and  the  implementation  of  modern  programming  languages  rely 
heavily  on  the  theory  of  context-free  languages  that  we  will  present  in  Part  111.  Con¬ 
text-free  grammars  are  used  to  document  the  languages’  syntax  and  they  form  the 
basis  for  the  parsing  techniques  that  all  compilers  use. 


The  use  of  context-free  grammars  to  define  programming  languages  and  to 
build  their  compilers  is  described  in  Appendix  G. 


•  People  use  natural  languages,  such  as  English,  to  communicate  with  each  other. 
Since  the  advent  of  word  processing,  and  then  the  Internet,  we  now  type  or  speak 
our  words  to  computers.  So  we  would  like  to  build  programs  to  manage  our  words, 
check  our  grammar,  search  the  World  Wide  Web,  and  translate  from  one  language 
to  another.  Programs  to  do  that  also  rely  on  the  theory  of  context-free  languages 
that  we  present  in  Part  III. 


A  sketch  of  some  of  the  main  techniques  used  in  natural  language  process¬ 
ing  can  be  found  in  Appendix  L. 


•  Systems  as  diverse  as  parity  checkers,  vending  machines,  communication  protocols, 
and  building  security  devices  can  be  straightforwardly  described  as  finite  state  ma¬ 
chines,  which  we'll  describe  in  Chapter  5. 
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A  vending  machine  is  described  in  Example  5.1.  A  family  of  network  com¬ 
munication  protocols  is  modeled  as  finite  slate  machines  in  1.1.  An  example 
of  a  simple  building  security  system,  modeled  as  a  finite  state  machine,  can  be 
found  in  J.l .  An  example  of  a  finite  state  controller  for  a  soccer-playing  robot 
can  be  found  in  P.4. 


•  Many  interactive  video  games  are  (large,  often  nondeterministie)  finite  slate 
machines. 


An  example  of  the  use  of  a  finite  stale  machine  to  describe  a  role  playing 
game  can  be  found  in  N.3.1. 

•  DNA  is  the  language  of  life.  DNA  molecules,  as  well  as  the  proteins  that  they  de¬ 
scribe,  are  strings  that  are  made  up  of  symbols  drawn  from  small  alphabets  (nu¬ 
cleotides  and  amino  acids,  respectively).  So  computational  biologists  exploit  many 
of  the  same  tools  that  computational  linguists  use.  For  example,  they  rely  on  tech¬ 
niques  that  are  based  on  both  finite  state  machines  and  context-free  grammars. 


For  a  very  brief  introduction  to  computational  biology  see  Appendix  K. 


•  Security  is  perhaps  the  most  important  property  of  many  computer  systems.  The 
undecidability  results  that  we  present  in  Part  IV  show  that  there  cannot  exist  a  gen¬ 
eral  purpose  method  for  automatically  verifying  arbitrary  security  properties  of 
programs.  The  complexity  results  that  we  present  in  Part  V  serve  as  the  basis  for 
powerful  encryption  techniques. 


For  a  proof  of  the  undecidability  of  the  correctness  of  a  very  simple  security 
1  model,  see  J.2.  For  a  short  introduction  to  cryptography,  see  J.3. 


•  Artificial  intelligence  programs  solve  problems  in  task  domains  ranging  from  medical 
diagnosis  to  factory  scheduling.  Various  logical  frameworks  have  been  proposed  for 
representing  and  reasoning  with  the  knowledge  that  such  programs  exploit. The  un¬ 
decidability  results  that  we  present  in  Part  IV  show  that  there  cannot  exist  a  general 
theorem  prover  that  can  decide,  given  an  arbitrary  statement  in  first  order  logic, 
whether  or  not  that  statement  follows  from  the  system’s  axioms. The  complexity  results 
that  we  present  in  Part  V  show  that,  if  we  back  off  to  the  far  less  expressive  system  of 
Boolean  (propositional)  logic,  while  it  becomes  possible  to  decide  the  validity  of  a  given 
statement,  it  is  not  possible  to  do  so,  in  general,  in  a  reasonable  amount  of  time. 


For  a  discussion  of  the  role  of  undccidability  and  complexity  results  in  arti¬ 
ficial  intelligence,  see  Appendix  M.  The  same  issues  plague  the  develop¬ 
ment  of  the  Semantic  Web.  (1.3) 
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Clearly  documented  and  widely  accepted  standards  play  a  pivotal  role  in  modern 
computing  systems.  Getting  a  diverse  group  of  users  to  agree  on  a  single  standard  is 
never  easy.  But  the  undecidabilily  and  complexity  results  that  we  present  in  Parts  IV 
and  V  mean  that,  for  some  important  problems,  there  is  no  single  right  answer  for  all 
uses.  Expressively  weak  standard  languages  may  be  tractable  and  decidable,  but  they 
may  simply  be  inadequate  for  some  tasks.  For  those  tasks,  expressively  powerful  lan¬ 
guages.  that  give  up  some  degree  of  tractability  and  possibly  decidability,  may  be  re¬ 
quired.  The  provable  lack  of  a  one-size-fits-all  language  makes  the  standards  process 
even  more  difficult  and  may  require  standards  that  allow  alternatives. 


We’ll  see  one  example  of  this  aspect  of  the  standards  process  when  we  con 
sider,  in  1.3,  the  design  of  a  description  language  for  the  Semantic  Web. 


Many  natural  structures,  including  ones  as  different  as  organic  molecules  and  com¬ 
puter  networks,  can  be  modeled  as  graphs. The  theory  of  complexity  that  we  present 
in  Part  V  tells  us  that,  while  there  exist  efficient  algorithms  for  answering  some  im¬ 
portant  questions  about  graphs,  other  questions  are  “hard",  in  the  sense  that  no  ef¬ 
ficient  algorithm  for  them  is  known  nor  is  one  likely  to  be  developed. 


We'll  discuss  the  role  of  graph  algorithms  in  network  analysis  in  1.2. 


Ihe  complexity  results  that  we  present  in  Part  V  contain  a  lot  of  bad  news.  There 
are  problems  that  matter  yet  for  which  no  efficient  algorithm  is  likely  ever  to  be 
found.  But  practical  solutions  to  some  of  these  problems  exist.  They  rely  on  a  vari¬ 
ety  of  approximation  techniques  that  work  pretty  well  most  of  the  time. 


An  almost  optimal  solution  to  an  instance  of  the  traveling  salesman  prob¬ 
lem  with  1.904,711  cities  has  been  found,  as  we’ll  see  in  Section  27.1.  Ran¬ 
domized  algorithms  can  find  prime  numbers  efficiently,  as  we'll  see  in 
Section  30.2.4.  Heuristic  search  algorithms  find  paths  in  computer  games 
(N.3.2)  and  move  sequences  for  champion  chess-playing  programs.  (N.2.5) 


CHAPTER  2 


Languages  and  Strings 


In  the  theory  that  we  are  about  to  build,  we  are  going  to  analyze  problems  by  cast¬ 
ing  them  as  instances  of  the  more  specific  question,  “Given  some  string  s  and  some 
language  L,  is  s  in  LI"  Before  we  can  formalize  what  we  mean  by  that,  we  need  to 
define  our  terms. 

An  alphabet, often  denoted  2,  is  a  finite  set.  We  will  call  the  members  of  2  symbols 
or  characters. 


2.1  Strings 

A  string  is  a  finite  sequence,  possibly  empty,  of  symbols  drawn  from  some  alphabet  2. 
Given  any  alphabet  2,  the  shortest  string  that  can  be  formed  from  2  is  the  empty 
string,  which  we  will  write  as  e.  The  set  of  all  possible  strings  over  an  alphabet  2  is  writ¬ 
ten  2*. This  notation  exploits  the  Kleene  star  operator,  which  we  will  define  more  gen¬ 
erally  below. 


EXAMPLE  2.1  Alphabets 


Alphabet  name 

Alphabet  symbols 

Example  strings 

The  English  alphabet 

{a,  b.  c,...,z} 

e,  aabbeg.  aaaaa 

The  binary  alphabet 

{0.1} 

e,  0,001100 

A  star  alphabet 

{*.0,  A} 

e,  00,0*^A*i* 

A  music  alphabet 

^  J,  J,  J*,  ^  } 

'•JjjlJUl 

In  running  text,  we  will  indicate  literal  symbols  and  strings  by  writing  them  1  i  ke  thi  s. 
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2.1.2  Functions  on  Strings 

The  length  of  a  siring  s ,  which  we  will  write  as  |s|,  is  the  number  of  symbols  in  s.  For 
example: 


|e|  =  0 

110011011  =  7 

For  any  symbol  c  and  string  s ,  we  define  the  function  #c(s)  to  be  the  number  of  times 
that  the  symbol  c  occurs  in  s.  So,  for  example,  #4(abbaaa)  =  4. 

The  concatenation  of  two  strings  s  and  t,  written  s  ||  t  or  simply  st,  is  the  string  formed 
by  appending  /  to  s.  For  example,  if  x  =  good  and  y  =  bye,  then  xy  =  goodbye.  So 
\xy\  =  |x|  +  |y|. 

The  empty  string,  e,  is  the  identity  for  concatenation  of  strings.  So  Vx  (xe  =  ex  =  x). 
Concatenation,  as  a  function  defined  on  strings,  is  associative.  So  Vs,  t,  w  (( st)w  = 
s  ( tw )). 

Next  we  define  string  replication.  For  each  string  w  and  each  natural  number  i,  the 
string  w'  is  defined  as: 


w°  =  e 


M?'  +  l  =  w'w 


For  example: 

a3  »  aaa 
(bye)2  -  byebye 
a°b3  -  bbb 

Finally  we  define  string  reversal.  For  each  string  w,  the  reverse  of  w ,  which  we  will 
write  wR ,  is  defined  as: 

If  |u>|  =  0  then  wR  =  w  =  e. 

If  \w\  s  1  then  3a  e  S  (3u  e  2*  (w  =  ua)),  (i.e.,  the  last  character  of  w  is  a.) 
Then  define  iaR  =  auR. 


THEOREM  2.1  Concatenation  and  Reverse  of  Strings 

Theorem:  If  w  and  x  are  strings,  then  =  xRwR. 

For  example,  (nametag)R  =  (tag)R(name)R  =  gatetnan. 

Proof:  The  proof  is  by  induction  on  |x|: 

Base  case:  |jt|  =  0.  Then  x  —  e,  and  (wx)R  =  (u>e)R  =  (ia)R  =  ewR  =  eR  wR  =  xrmjr. 

Prove:Vn  ^  0  «(UI  =  n)  —  ((u*)R  =  xV))  ((|x|  =  n  +  1)—  ((u;jc)R  =  xRu-R))). 

Consider  any  string  x,  where  |x|  =  n  +1.  Then  x  =  ua  for  some  character  a  and 
|u|  =  n.  So: 
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=  (w(ita))* 

rewrite  .v  as  ua 

-  ((wti)a)R 

associativity  of  concatenation 

=  a(wu)R 

definition  of  reversal 

=  a(itRwR) 

induction  hypothesis 

=  (auR)u,R 

associativity  of  concatenation 

=  (ua)RwR 

definition  of  reversal 

_ 

rewrite  ua  as  x 

2.1.3  Relations  on  Strings 

A  string  s  is  a  substring  of  a  string  r  iff  s  occurs  contiguously  as  part  of  t.  For  example: 

aaa  is  a  substring  of  aaabbbaaa 

aaaaaa  is  not  a  substring  of  aaabbbaaa 

A  string  s  is  a  proper  substring  of  a  string  /  iff  s  is  a  substring  o!  t  and  s  *  t.  Every 
string  is  a  substring  (although  not  a  proper  substring)  of  itself. The  empty  string,  e.  is  a 
substring  of  every  siring. 

A  string  5  is  a  prefix  of  t  iff  3*  e  S*  (/  =  &t).  A  siring  s  is  a  proper  prefix  of  a  string  t 
iff  v  is  a  prefix  of  /  and  s  #  /.  Every  string  is  a  prefix  (although  not  a  proper  prefix)  of 
itself.  The  empty  string,  e,  is  a  prefix  of  every  string.  For  example,  the  prefixes  of  abba 
are:  e,  a,  ab.  abb.  abba. 

A  string  s  is  a  suffix  of  t  iff  3x  e  2*  (f  -  xs).  A  string  s  is  a  proper  suffix  of  a  siring  t 
iff  s  is  a  suffix  of  /  and  ,v  *  t.  Every  string  is  a  suffix  (although  not  a  proper  suffix)  of  it¬ 
self.  The  empty  string,  e.  is  a  suffix  of  every  string.  For  example,  the  suffixes  of  abba  are: 
e.  a,  ba,  bba,  abba. 

2.2  Languages 

A  language  is  a  (finite  or  infinite)  set  of  strings  over  a  finite  alphabet  2.  When  we  are 
talking  about  more  than  one  language,  we  will  use  the  notation  2,  to  mean  the  alpha¬ 
bet  from  which  the  strings  in  the  language  L  are  formed. 


EXAMPLE  2.2  Defining  Languages  Given  an  Alphabet 

Let  2  =  {a,  b}.  2*  =  {e,  a,  b,  aa,  ab,  ba,  bb,  aaa,  aab, ... }. 

Some  examples  of  languages  over  2  are: 

0,  {e} ,  (a,  b),{e,  a,  aa,  aaa,  aaaa,  aaaaa}, 

(e,  a,  aa,  aaa,  aaaa,  aaaaa,...} 
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2.2.2  Techniques  for  Defining  Languages 

We  will  use  a  variety  of  techniques  for  defining  the  languages  that  we  wish  to  consider. 
Since  languages  arc  sets,  we  can  define  them  using  any  of  the  set-defining  techniques 
that  are  described  in  A.2.  For  example,  we  can  specify  a  characteristic  function,  i.e.,  a 
predicate  that  is  True  of  every  element  in  the  set  and  False  of  everything  else. 


EXAMPLE  2.B  All  a's  Precede  All  b's 

Let  L  =  {u>e  {a,b}*  :  all  a’s  precede  all  b’s  in  w}.  The  strings  e,  a,  aa,  aabbb, 
and  bb  are  in  L.The  strings  aba,  ba,  and  abc  are  not  in  L.  Notice  that  some  strings 
trivially  satisfy  the  requirement  for  membership  in  L.The  rule  says  nothing  about 
there  having  to  be  any  a’s  or  any  b’s.  All  it  says  is  that  any  a’s  there  are  must  come 
before  all  the  b’s  (if  any).  If  there  are  no  a’s  or  no  b’s,  then  there  can  be  none  that 
violate  the  rule.  So  the  strings  e,  a,  aa,  and  bb  trivially  satisfy  the  rule  and  are  in  L. 


EXAMPLE  2.4  Strings  That  End  in  a 

Let  L  =  {or:  3y  g  {a,b}*  ( x  =  ya)}.The  strings  a,  aa,  aaa,  bbaa.  and  ba  are  in  L. 
The  strings  e,  bab,  and  bca  are  not  in  L.  L  consists  of  all  strings  that  can  be 
formed  by  taking  some  string  in  {a,b}*  and  concatenating  a  single  a  onto  the 
end  of  it. 


EXAMPLE  2.5  The  Perils  of  Using  English  to  Describe  Languages 

Let  L  =  {.v#y  :x,ye  {0,  L  2. 3. 4.  5. 6, 7, 8. 9}*  and,  when  x  and  y  are  viewed  as 
the  decimal  representations  of  natural  numbers,  square[x)  =  y}.The  strings  3#9 
and  12#144  are  in  L.The  strings  3#8, 12,  and  12#12#12  are  not  in  L.  But  what 
about  the  string  #?  Is  it  in  L?  It  depends  on  what  we  mean  by  the  phrase, "when  x 
and  y  are  viewed  as  the  decimal  representations  of  natural  numbers.”  Is  e  the  dec¬ 
imal  representation  of  some  natural  number?  It  is  possible  that  an  algorithm  that 
converts  strings  to  numbers  might  convert  e  to  0.  In  that  case,  since  0  is  the  square 
of  0,  #  is  in  L.  If,  on  the  other  hand,  the  string-to-integer  converter  fails  to  accept  e 
as  a  valid  input.  #  is  not  in  L.This  example  illustrates  the  dangers  of  using  English 
descriptions  of  sets.  They  are  sometimes  ambiguous.  We  will  strive  to  use  only 
unambiguous  terms.  We  will  also,  as  we  discuss  below,  develop  other  definitional 
techniques  that  do  not  present  this  problem. 
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EXAMPLE  2.6  The  Empty  Language 

Let  L  =  { }  =  0.  L  is  the  language  that  contains  no  strings. 


EXAMPLE  2.7  The  Empty  Language  is  Different  From  the  Empty  String 

Let  L  =  {e},  the  language  that  contains  a  single  string,  e.  Note  that  L  is  different 
from  0. 


All  of  the  examples  we  have  considered  so  far  fit  the  definition  that  we  are 
using  for  the  term  language:  a  set  of  strings.  They're  quite  different,  though,  from 
the  everyday  use  of  the  term.  Everyday  languages  are  also  languages  under  our 
definition. 


EXAMPLE  2.8  English  Isn't  a  Well-Defined  Language 

Let  L  =  (w :  iv  is  a  sentence  in  English}. 

Examples:  Kerry  hit  the  ball.  I*  Clearly  in  L. 

Colorless  green  ideas  sleep  furiously.4  /*  The  syntax  is  correct 

but  what  could  it  mean? 

The  window  needs  fixed.  /*  In  some  dialects  of  L. 

Ball  the  Stacy  hit  blue.  I*  Clearly  not  in  L. 


The  problem  with  languages  like  English  is  that  there  is  no  clear  agreement  on 
what  strings  they  contain.  We  will  not  be  able  to  apply  the  theory  that  we  are 
about  to  build  to  any  language  for  which  we  cannot  first  produce  a  formal  specifi¬ 
cation.  Natural  languages,  like  English  or  Spanish  or  Chinese,  while  hard  to  spec¬ 
ify,  are  of  great  practical  importance,  though.  As  a  result,  substantial  effort  has 
been  expended  in  creating  formal  and  computationally  effective  descriptions  of 
them  that  are  good  enough  to  be  used  as  the  basis  for  applications  such  as  gram¬ 
mar  checking  and  text  database  retrieval. 


To  the  extent  that  formal  descriptions  of  natural  languages  like  English  can 
be  created,  the  theory  that  we  are  about  to  develop  can  be  applied,  as  we  will 
see  in  Parts  II  and  III  and  Appendix  L. 


4 This  classic  example  of  a  syntactically  correct  hut  semantically  anomalous  sentence  is  from  |C'homsky  1957). 
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EXAMPLE  2.9  A  Halting  Problem  Language 

Let  L  —  {w :  to  is  a  C  program  that  halts  on  all  inputs}.  L  is  substantially  more 
complex  than,  for  example,  {jr  e  {a,b}*:  all  a’s  precede  all  b’s}.  But,  unlike  Eng¬ 
lish,  there  does  exist  a  clear  formal  specification  of  it.  The  theory  that  we  are 
about  to  build  will  tell  us  something  very  useful  about  L. 


We  can  use  the  relations  that  we  have  defined  on  strings  as  a  way  to  define 
languages. 


EXAMPLE  2.10  Using  the  Prefix  Relation 

We  define  the  following  languages  in  terms  of  the  prefix  relation  on  strings: 
L!={we{a,  b}  *  :  no  prefix  of  w  contains  b} 

=  {e,  a,  aa,  aaa,  aaaa,  aaaaa,  aaaaaa,...  }. 

L2={we  {a,  b}  *  :  no  prefix  of  w  starts  with  b} 

=  {we  {a,  b}* :  the  first  character  of  w  is  a  }  U  {e}. 

L3=  {w  e  {a,  b}*  :  every  prefix  of  w  starts  with  b} 

=  0. 

L3  is  equal  to  0  because  e  is  a  prefix  of  every  string.  Since  e  does  not  start  with 
b,  no  strings  meet  L3’s  requirement. 


Recall  that  we  defined  the  replication  operator  on  strings:  For  any  string  s  and  inte¬ 
ger  n ,  s"  =  n  copies  of  s  concatenated  together.  For  example,  (bye)2  =  byebye.  We 
can  use  replication  as  a  way  to  define  a  language,  rather  than  a  single  string,  if  we  allow 
n  to  be  a  variable,  rather  than  a  specific  constant. 


EXAMPLE  2.11  Using  Replication  to  Define  a  Language 
Let  L  —  {a  .  w  ^  0}.  L  [S|  a ,  aa ,  aaa ,  aaaa ,  aaaaa , . . . }. 


Languages  are  sets.  So,  if  we  want  to  provide  a  computational  definition  of  a  lan¬ 
guage,  we  could  specify  either: 

•  a  language  generator,  which  enumerates  (lists)  the  elements  of  the  language,  or 

•  a  language  recognizer,  which  decides  whether  or  not  a  candidate  string  is  in  the  lan¬ 
guage  and  returns  True  if  it  is  and  False  if  it  isn’t. 
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For  example,  the  logical  definition.  L  =  {x:  3ye|a.  b}*  (.v  =  yal|  can  be  turned 
into  either  a  language  generator  (enumerator)  or  a  language  recognizer. 

In  some  cases,  when  considering  an  enumerator  for  a  language  we  may  care 
about  the  order  in  which  the  elements  of  L  are  generated.  If  there  exists  a  total  order 
D  of  the  elements  of  (as  there  does,  for  example,  on  the  letters  of  the  Roman  alpha¬ 
bet  or  the  symbols  for  the  digits  0  -  9).  then  we  can  use  l)  to  define  on  l.  a  useful  total 
order  called  lexicographic  order  (written  </  ): 

•  Shorter  strings  precede  longer  ones:  V.r  (Vy  (( |x|  <  |  vl )  — •  (a-  <  L  y ))).  and 

•  Of  strings  that  are  the  same  length,  sort  them  in  dictionary  order  using  D. 

When  we  use  lexicographic  order  in  the  rest  of  this  book,  we  will  assume  that  D  is 
the  standard  sort  order  on  letters  and  numerals.  If  D  is  not  obvious,  we  w  ill  slate  it. 

We  will  say  that  a  program  lexicographically  enumerates  the  elements  of  L  ifl  it 
enumerates  them  in  lexicographic  order. 


EXAMPLE  2.12  Lexicographic  Enumeration 

Let  L  -  {xe  {a,  b}* :  all  a’s  precede  all  b’s}.The  lexicographic  enumeration 
of  L  is: 

e,  a.  b.  aa.  ab.  bb,  aaa.  aab.  abb.  bbb.  aaaa,  aaab.  aabb.  abbb.  bbbb.  aaaaa.  . . . 


In  Parts  II.  III.  and  IV  of  this  book,  we  will  consider  a  variety  of  formal  techniques 
for  specifying  both  generators  (enumerators)  and  recognizers  for  various  classes  of 
languages. 

2.2.3  What  is  the  Cardinality  of  a  Language? 

How  large  is  a  language?  The  smallest  language  over  any  alphabet  is  0.  whose  cardi¬ 
nality  is  l). The  largest  language  over  any  alphabet  1  is  il*.  What  is  12*1?  Suppose  that 
2=0.  Then  2*  =  {e}  and  |2*|  =  1.  But  what  about  the  far  more  useful  case  in 
which  1  is  not  empty? 

THEOREM  2.2  The  Cardinality  of  2* 

Theorem:  If  2  *  0  then  2*  is  countably  infinite. 

Proof:  The  elements  of  can  be  lexicographically  enumerated  by  a  straightfor¬ 
ward  procedure  that: 

•  Enumerates  all  strings  of  length  0,  then  length  I .  then  length  2.  and  so  forth. 

•  Within  the  strings  ot  a  given  length,  enumerates  them  in  dictionary  order. 
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This  enumeration  is  infinite  since  there  is  no  longest  string  in  2*.  By  Theorem  A.l, 
since  there  exists  an  infinite  enumeration  of  2*,  it  is  countably  infinite. 

Since  any  language  over  2  is  a  subset  of  2*,  the  cardinality  of  every  language  is  at 
least  0  and  at  most  N0.  So  all  languages  are  either  finite  or  countably  infinite. 

2.2.4  How  Many  Languages  Are  There? 

Let  2  be  an  alphabet.  How  many  different  languages  are  there  that  are  defined  on  2? 
The  set  of  languages  defined  on  2  is  &  (2*),  the  power  set  of  2*.  or  the  set  of  all  sub¬ 
sets  of  2*.  If  2  =  0  then  2*  is  {e}  and  ^(2*)  is  {0,  { e} }.  But,  again,  what  about  the 
useful  case  in  which  2  is  not  empty? 

THEOREM  2.3  An  Uncountably  Infinite  Number  of  Languages 

Theorem:  If  2  *  0  then  the  set  of  languages  over  2  is  uncountably  infinite. 

Proof:  The  set  of  languages  defined  on  2  is  3P(  2*).  By  Theorem  2.2. 2*  is  count¬ 
ably  infinite.  By  Theorem  A.4,  if  S  is  a  countably  infinite  set,  2P  ( S )  is  uncount- 
ahly  infinite.  So  '.'>*( 2*)  is  uncountably  infinite. 

2.2.5  Functions  on  Languages 

Since  languages  are  sets,  all  of  the  standard  set  operations  are  well-defined  on  languages. 
In  particular,  we  will  find  union,  intersection,  difference,  and  complement  to  be  useful. 
Complement  will  be  defined  with  2*  as  the  universe  unless  we  explicitly  state  otherwise. 


EXAMPLE  2.13  Set  Functions  Applied  to  Languages 
Let:  2  =  {a,b}. 

L,  =  { strings  with  an  even  number  of  a’s}. 

L2  =  {strings  with  no  b  s}  =  (e,  a,  aa,  aaa,  aaaa,  aaaaa,  aaaaaa,  . . . }. 

L|  U  L2  —  {all  strings  of  just  a’s  plus  strings  that  contain  b's  and  an  even 
number  of  a’s}. 

L\  fl  L2  =  {e,  aa,  aaaa,  aaaaaa.  aaaaaaaa,  . . . }. 

L2-L\  =  {a. aaa, aaaaa,  aaaaaaa, 

~'{L2  -  L\)  =  {strings  with  at  least  one  b  }  U  {strings  with  an  even  number 
ofa’s}. 

Because  languages  are  sets  of  strings,  it  makes  sense  to  define  operations  on  them  in 
terms  of  the  operations  that  we  have  already  defined  on  strings.  Three  useful  ones  to 
consider  are  concatenation.  Kleene  star,  and  reverse. 

Let  L |  and  L2  be  two  languages  defined  over  some  alphabet  2.  Then  their 
concatenation,  written  L\L2  is: 

L\L2  =  {we  2*  :  3s  e  L\  (3 teL2{w  =  s/))}. 
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EXAMPLE  2.14  Concatenation  of  Languages 

Lei:  L,  =  {cat,  dog.  mouse,  bi  rd}. 

L2  -  {bone, food}. 

L\L2  =  {catbone,  catfood,  dogbone,  dogfood.  mousebone,  mousefood, 
bi  rdbone,  bi  rdfood}. 

The  language  {e}  is  the  ideniity  for  concatenation  of  languages.  So.  for  all  languages 
L,  L{e)  =  {e}L  =  L. 

The  language  0  is  a  zero  for  concatenation  of  languages.  So.  for  all  languages  L, 
L0  =  0L  =  0.  That  0  is  a  zero  follows  from  the  definition  of  the  concatenation  of 
two  languages  as  the  set  consisting  of  all  strings  that  can  be  formed  by  selecting  some 
strings'  from  the  first  language  and  some  string  t  from  the  second  language  and  then 
concatenating  them  together.  There  are  no  ways  to  select  a  string  from  the  empty  set. 

Concatenation,  as  a  function  defined  on  languages,  is  associative.  So.  for  all  lan¬ 
guages  L\,  L2,  and  L3: 

((^*  i  ^2)^*3  =  LiiLiL})). 

It  is  important  to  be  careful  when  concatenating  languages  that  are  defined  using 
replication.  Recall  that  we  used  the  notation  {a":n  2l)|  to  mean  the  set  of  strings 
composed  of  zero  or  more  a ’s.  That  notation  is  a  shorthand  for  a  longer,  perhaps  clearer 
expression,  {w :  3n  ^  0  ( w  =  a”)}.  In  this  form,  it  is  clear  that  n  is  a  variable  bound  by 
an  existential  quantifier.  We  will  use  the  convention  that  the  scope  of  such  quantifiers  is 
the  entire  expression  in  which  they  occur.  So  multiple  occurrences  of  the  same  variable 
letter  are  the  same  variable  and  must  take  on  the  same  value.  Suppose  that  L\  = 
{a ":n  s  0}  and  L2  -  {bft:/i  ^  0}.  By  the  definition  of  language  concatenation, 
LxL2=  {w:  w  consists  of  a  (possibly  empty)  a  region  followed  by  a  (possibly  empty) 
b  region}.  L,L2  *  {a"b'':/i  s  0},  smee  every  string  in  {a"b":/j  >  0}  must  have  the 
same  number  of  b’s  as  a’s.The  easiest  way  to  avoid  confusion  is  simply  to  rename  con¬ 
flicting  variables  before  attempting  to  concatenate  the  expressions  that  contain  them. 
So  L,L2  =  {a"b'”:/i,/H  ^  0}.  In  Chapter  6  we  will  define  a  convenient  notation  that 
will  let  us  write  this  as  a*b*. 

Let  L  be  a  language  defined  over  some  alphabet  2.  Then  the  Kleene  star  of  L ,  writ¬ 
ten  L*  is: 

L*  =  {e]  U  {tue  2*  :  3k  s  1  (3w^w2,...u,i(e  L(w  -  u'\U'2...  t<’*))}. 

In  other  words,  L*  is  the  set  of  strings  that  can  be  formed  by  concatenating  together 
zero  or  more  strings  from  L. 

EXAMPLE  2.15  Kleene  Star 

LetL=  {dog,  cat,  fish}.  Then: 

L*  =  {e.  dog,  cat,  fish,  dogdog,  dogcat . 

fi shdog . f i  shcatf i sh,  f i shdogf i  shcat. . . . }. 
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EXAMPLE  2.16  Kleene  Star,  Again 

Let  L  =  {we  {a.b}*  :  #a(w)  is  odd  and  #b(w)  is  even}. Then  L*  =  {we  {a,  b}* : 
#b(w)  is  even}.  The  constraint  on  the  number  of  a’s  disappears  in  the  description  of 
L*  because  strings  in  L*  are  formed  by  concatenating  together  any  number  of 
strings  from  L.  If  an  odd  number  of  strings  are  concatenated  together,  the  result 
will  contain  an  odd  number  of  a's.  If  an  even  number  are  used,  the  result  will  con¬ 
tain  an  even  number  of  a's. 


L*  always  contains  an  infinite  number  of  strings  as  long  as  L  is  not  equal  to  either  0 
or  {e}  (i.e.,as  long  as  there  is  at  least  one  nonempty  string  any  number  of  which  can  be 
concatenated  together).  If  L  =  0,  then  L*  =  {e},  since  there  are  no  strings  that  could 
be  concatenated  to  e  to  make  it  longer.  If  L  =  {e},  then  L*  is  also  {e}. 

It  is  sometimes  useful  to  require  that  at  least  one  element  of  L  be  selected.  So  we 
define: 

L+  =  LL*. 

Another  way  to  describe  L+  is  that  it  is  the  closure  of  L  under  concatenation.  Note 
that  L+  =  L*  -  {e}  iffegL. 


EXAMPLE  2.17  L+ 

Let  L  =  {0, 1}+  be  the  set  of  binary  strings.  L  does  not  include  e. 


Let  L  be  a  language  defined  over  some  alphabet  2.  Then  the  reverse  of  L,  written 
Lr  is: 

LR  =  {m>  e  2*  :  w  =  xR  for  some  x  e  L}. 

In  other  words,  LR  is  the  set  of  strings  that  can  be  formed  by  taking  some  string  in  L 
and  reversing  it. 

Since  we  have  defined  the  reverse  of  a  language  in  terms  of  the  definition  of  reverse 
applied  to  strings,  we  expect  it  to  have  analogous  properties. 

THEOREM  2.4  Concatenation  and  Reverse  of  Languages 

I  Theorem:  If  L,  and  L2  are  languages,  then  (L,L2)R  =  L2RL,R. 

Proof:  If  x  and  y  are  strings,  then  V*  (Vy  ((.xy)R  =  yRxR))  Theorem  2.1 

(L  [  L2)r  =  { (.vy)R  :xeL\  and  y  e  L2}  Definition  of  concatenation 

of  languages 

=  {yRxR  :xeL]  and y  e  L2}  Lines  1  and 2 

-  l2  '-\  Definition  of  concatenation 

of  languages 
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We  have  now  defined  the  two  important  data  types,  string  and  language,  that  we  will 
use  throughout  this  book.  In  the  next  chapter,  we  will  see  how  we  can  use  them  to  de¬ 
fine  a  framework  that  will  enable  us  to  analyze  computational  problems  of  all  sorts 
(not  just  ones  you  may  naturally  think  of  in  terms  of  strings). 

2.2.6  Assigning  Meaning  to  the  Strings  of  a  Language 

Sometimes  we  are  interested  in  viewing  a  language  just  as  a  set  of  strings.  For  exam¬ 
ple.  we’ll  consider  some  important  formal  properties  of  the  language  we  ll  call 
A"Bn  =  { a''b':  n  ^  ()}.  In  other  words.  A"B"  is  the  language  composed  of  all  strings  of 
a’s  and  b’s  such  that  all  the  a's  come  first  and  the  number  or  a's  equals  the  number  of 
b’s.  We  won’t  attempt  to  assign  meanings  to  any  of  those  strings 

But  some  languages  are  useful  precisely  because  their  strings  do  have  meanings.  We 
use  natural  languages  like  English  and  Chinese  because  they  allow  us  to  communicate 
ideas.  A  program  in  a  language  like  Java  or  C++  or  Perl  also  has  a  meaning.  In  the  ease 
of  a  programming  language,  one  way  to  define  meaning  is  in  terms  of  some  other  (typ¬ 
ically  closer  to  machine  architecture)  language.  So,  for  example,  the  meaning  of  a  Java 
program  can  be  described  as  a  Java  Virtual  Machine  program.  An  alternative  is  to  de¬ 
fine  a  program’s  meaning  in  a  logical  language. 

Philosophers  and  linguists  (and  others)  have  spent  centuries  arguing  about  what 
sentences  in  natural  languages  like  English  (or  Sanskrit  or  whatever)  mean.  We  won’t 
attempt  to  solve  that  problem  here.  But  if  we  are  going  to  work  with  formal  languages, 
we  need  a  precise  way  to  map  each  siring  to  its  meaning  (also  called  its  semantics ). 
We’ll  call  a  function  that  assigns  meanings  to  strings  a  semantic  interpretation 
function.  Most  of  the  languages  we’ll  be  concerned  with  are  infinite  because  there  is  no 
bound  on  the  length  of  the  strings  that  they  contain.  So  it  won’t,  in  general,  be  possible 
to  define  meanings  by  a  table  that  pairs  each  string  with  its  meaning. 

We  must  instead  define  a  function  that  knows  the  meanings  of  the  language’s  basic 
units  and  can  combine  those  meanings,  according  to  some  fixed  set  of  rules,  to  build 
meanings  for  larger  expressions.  We  call  such  a  function,  which  can  be  said  to  “com¬ 
pose”  the  meanings  of  simpler  constituents  into  a  single  meaning  for  a  larger  expres¬ 
sion,  a  compositional  semantic  interpretation  function.  There  arguably  exists  a  mostly 
compositional  semantic  interpretation  function  for  English.  Linguists  fight  about  the 
gory  details  of  what  such  a  function  must  look  like.  Everyone  agrees  that  words  have 
meanings  and  that  one  can  build  a  meaning  for  a  simple  sentence  by  combining  the 
meanings  of  the  subject  and  the  verb.  For  example,  speakers  of  English  would  have  no 
trouble  assigning  a  meaning  to  the  sentence.  “I  gave  him  the  fizding."  provided  that 
they  are  told  what  the  meaning  of  the  word  “fizding”  is.  Everyone  also  agrees  that  the 
meaning  of  idioms,  like  “I’m  going  to  give  him  a  piece  of  my  mind."  cannot  be  derived 
compositionally.  Some  other  issues  are  more  subtle. 


Languages  whose  strings  have  meaning  pervade  computing  and  its  applica¬ 
tions.  Boolean  logic  and  first-order  logic  are  languages.  Programming  lan¬ 
guages  are  languages.  (G.l)  Network  protocols  are  languages.  (1.1 )  Database 
query  languages  are  languages.  (Q.  1.1)  HTML  is  a  language  for  defining 
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Web  pages.  (Q.1.2)  XML  is  a  more  general  language  for  marking  up  data. 
(Q.1.2)  OWL  is  a  language  for  defining  the  meaning  of  tags  on  the  Web. 
(1.3.6)  BNF  is  a  language  that  can  be  used  to  specify  the  syntax  of  other  lan¬ 
guages.  (G.1.1)  DNA  is  a  language  for  describing  proteins.  (K.1.2)  Music  is  a 
language  based  on  sound.  (N.l) 


When  we  define  a  formal  language  for  a  specific  purpose,  we  design  it  so  that  there 
exists  a  compositional  semantic  interpretation  function.  So.  for  example,  there  exist 
compositional  semantic  interpretation  functions  for  programming  languages  like  Java 
and  C++.  There  exists  a  compositional  semantic  interpretation  function  for  the  lan¬ 
guage  of  Boolean  logic.  It  is  specified  by  the  truth  tables  that  define  the  meanings  of 
whichever  operators  (e.g..  A,  V,->  and  — *  )  are  allowed. 

One  significant  property  of  semantic  interpretation  functions  for  useful  languages  is 
that  they  are  generally  not  one-to-one.  Consider: 

•  English:  The  sentences,  “Chocolate,  please,”  *Td  like  chocolate,”  “I’ll  have  choco¬ 
late.”  and  “1  guess  chocolate  today,”  all  mean  the  same  thing,  at  least  in  the  context 
of  ordering  an  ice  cream  cone. 

•  Java:  The  following  chunks  of  code  all  do  the  same  thing: 

int  x  =  4;  intx-4;  intx=4;  intx-4; 

x++;  ++x;  x  -  x  +  1;  x  =  x  — 1; 

The  semantic  interpretation  functions  that  we  will  describe  later  in  this  book,  for  ex¬ 
ample  for  the  various  grammar  formalisms  that  we  will  introduce,  will  not  be  one-to- 
one  either. 


Exercises 

1.  Consider  the  language  L  =  {1"2":  n  >  0}.  Is  the  string  122  in  L? 

2.  Let  L\  =  {a"b":n  >  0}.  Let  L2=  {cn:n  >  0}.  For  each  of  the  following 
strings,  state  whether  or  not  it  is  an  element  of  L ,  L2: 

a.  e. 

b.  aabbcc. 

c.  abbcc. 

d.  aabbcccc. 

3.  Let  L\  =  {peach, apple, cherry)  and  L2  =  {pie,  cobbler,  e).  List  the  ele¬ 
ments  of  L\L2  in  lexicographic  order. 

4.  Let  L  =  {we  {a,  b}*:  |w|  *30).  List  the  first  six  elements  in  a  lexicographic 
enumeration  of  L. 

5.  Consider  the  language  L  of  all  strings  drawn  from  the  alphabet  {a,  b}  with  at 
least  two  different  substrings  of  length  2. 
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a.  Describe  L  by  writing  a  sentence  of  the  form  L  =  { iv  e  2*  :  P(  w) } .  where  2 

is  a  set  of  symbols  and  P  is  a  first-order  logic  formula.  You  may  use  the  func¬ 
tion  Ul  to  return  the  length  of  s.  You  may  use  all  the  standard  relational  sym¬ 
bols  (e.g.,  =,  <,  etc.),  plus  the  predicate  Subsir(s.  t).  which  is  True  iff  s  is  a 

substring  of  i. 

b.  List  the  first  six  elements  of  a  lexicographic  enumeration  of  L. 

6.  For  each  of  the  following  languages  L,  give  a  simple  English  description.  Show 
two  strings  that  are  in  L  and  two  that  are  not  (unless  there  are  fewer  than  two 
strings  in  L  or  two  not  in  L,  in  which  case  show  as  many  as  possible). 

a.  L  =  {tee  {a,  b}* :  exactly  one  prefix  of  w  ends  in  a  }. 

b.  L  =  {we  {a,  b}* :  all  prefixes  of  w  end  in  a  }. 

c.  L  -  {we  {a, b}*  :  3xe  (a. b}+(w  =  axa  }. 

7.  Are  the  following  sets  closed  under  the  following  operations?  If  not,  what  are 
their  respective  closures? 

a.  The  language  {a.  b}  under  concatenation. 

b.  The  odd  length  strings  over  the  alphabet  {a,  b}  under  Kleene  star. 

c.  L  =  {we  {a,  b}*}  under  reverse. 

d.  L  —  {we  {a,b}*  :  w  starts  with  a  }  under  reverse. 

e.  L  =  [we  {a,b}*  :  w?ends  in  a  )  under  concatenation. 

8.  For  each  of  the  following  statements,  state  whether  it  is  True  or  False.  Prove  your 
answer. 

a.  VL,,L2(L,  =  L2iff  L\*  =  L2*). 

b.  (0  U  0*)  f)  {-<0-  (00*))  =  0  (where  ->0  is  the  complement  of  0). 

c.  Every  infinite  language  is  the  complement  of  a  finite  language. 

d.  VL  ((JLr)r  =  L). 

e.  VL,,L2((L,L2)*  =  L,*L2*). 

f.  VL„L2((Li*L2*L1*)*  =  (L2UL,)*). 

g.  VL,,L2((L1UL2)*  =  L1*UL2*). 

h.  VL|,  L2,  L3 ((L,  U  L2)L3  =  (L,L3)U(L2L3)). 

i.  VL,,  L2,  L3((L,L2)  U  L3  =  (L,UL3)(L2UZ.3)). 

j.  VL  ((!")*  =  L*). 

k.  VL  (0L*  =  {e}). 

l.  VL  (0  U  L+  =  L*). 

m.  VL„L2((L,UL2)*  =  (L2UL,)*). 
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The  Big  Picture:  A  Language 
Hierarchy 

Our  goal,  in  the  rest  of  this  book,  is  to  build  a  framework  that  lets  us  examine  a 
new  problem  and  be  able  to  say  something  about  how  intrinsically  difficult  it 
is.  In  order  to  do  this,  we  need  to  be  able  to  compare  problems  that  appear,  at 
first  examination,  to  be  wildly  different.  Apples  and  oranges  come  to  mind.  So  the  first 
thing  we  need  to  do  is  to  define  a  single  framework  into  which  any  computational 
problem  can  be  cast.  Then  we  will  be  in  a  position  to  compare  problems  and  to  distin¬ 
guish  between  those  that  are  relatively  easy  to  solve  and  those  that  are  not. 

3.1  Defining  the  Task:  Language  Recognition 

The  unifying  framework  that  we  will  use  is  language  recognition.  Assume  that  we  are 
given: 

•  The  definition  of  a  language  L.  (We  will  consider  about  half  a  dozen  different  tech¬ 
niques  for  providing  this  definition.) 

•  A  string  w. 


Then  we  must  answer  the  question:  “Is  w  in  LT'  This  question  is  an  instance  of  a 
more  general  class  that  we  will  call  decision  problems.  A  decision  problem  is  simply  a 
problem  that  requires  a  yes  or  no  answer. 

In  the  rest  of  this  book,  we  will  discuss  programs  to  solve  decision  problems  specifi¬ 
cally  of  the  form, “Is  w  in  LI"  We  will  see  that,  for  some  languages,  a  very  simple  pro¬ 
gram  suffices.  For  others,  a  more  complex  one  is  required.  For  still  others,  we  will  prove 
that  no  program  can  exist. 
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3.2  The  Power  of  Encoding 

The  question  that  we  are  going  to  ask, “Is  w  in  LT  may  seem,  at  first  glance,  way  too 
limited  to  be  useful.  What  about  problems  like  multiplying  numbers,  sorting  lists,  and 
retrieving  values  from  a  database?  And  what  about  real  problems  like  air  traffic  control 
or  inventory  management?  Can  our  theory  tell  us  anything  interesting  about  them? 

The  answer  is  yes  and  the  key  is  encoding.  With  an  appropriate  encoding,  other 
kinds  of  problems  can  be  recast  as  the  problem  of  deciding  whether  a  string  is  in  a  lan¬ 
guage.  We  will  show  some  examples  to  illustrate  this  idea.  We  will  divide  the  examples 
into  two  categories: 

•  Problems  that  are  already  staled  as  decision  problems.  For  these,  all  we  need  to  do 
is  to  encode  the  inputs  as  strings  and  then  define  a  language  that  contains  exactly 
the  set  of  inputs  for  which  the  desired  answer  is  yes. 

•  Problems  that  are  not  already  stated  as  decision  problems.  These  problems  may 
require  results  of  any  type.  For  these,  we  must  first  reformulate  the  problem  as  a 
decision  problem  and  then  encode  it  as  a  language  recognition  task. 

3.2.1  Everything  is  a  String 

Our  stated  goal  is  to  build  a  theory  of  computation.  What  we  are  actually  about  to 
build  is  a  theory  specifically  of  languages  and  strings.  Of  course,  in  a  computer's  mem¬ 
ory,  everything  is  a  (binary)  string.  So.  at  that  level,  it  is  obvious  that  restricting  our  at¬ 
tention  to  strings  does  not  limit  the  scope  of  our  theory.  Often,  however,  we  w  ill  find  it 
easier  to  work  with  languages  with  larger  alphabets. 

Each  time  we  consider  a  new  problem,  our  first  task  will  be  to  describe  it  in  terms  of 
strings.  In  the  examples  that  follow,  and  throughout  the  book,  we  will  use  the  notation 
<X>  to  mean  a  string  encoding  of  some  object  X.  We  ll  use  the  notation  <  X.  Y>  to 
mean  the  encoding,  into  a  single  string,  of  the  two  objects  X  and  Y. 

The  first  three  examples  we'll  consider  are  of  problems  that  are  naturally  described 
in  terms  of  strings.  Then  we'll  look  at  examples  where  we  must  begin  by  constructing 
an  appropriate  string  encoding. 

EXAMPLE  3.1  Pattern  Matching  on  the  Web 

•  Problem:  Given  a  search  string  w  and  a  web  document  d ,  do  they  match?  In 
other  words,  should  a  search  engine,  on  input  w,  consider  returning  </? 

•  The  language  to  be  decided.  {<w,d>  :d  is  a  candidate  match  for  the  query  w). 


EXAMPLE  3.2  Question-Answering  on  the  Web 

•  Problem:  Given  an  English  question  q  and  a  web  document  d  (which  may  be 
in  English  or  Chinese ),  does  d  contain  the  answer  to  ql 

•  The  language  to  be  decided: { <q ,  d>  :  d  contains  the  answer  to  q). 
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The  techniques  that  we  will  describe  in  the  rest  of  this  book  are  widely  used 
in  the  construction  of  systems  that  work  with  natural  language  (e.  g„  English 
or  Spanish  or  Chinese)  text  and  speech  inputs.  (Appendix  L) 


EXAMPLE  3.3  Does  a  Program  Always  Halt? 

•  Problem:  Given  a  program  p,  written  in  some  standard  programming  language, 
is  p  guaranteed  to  halt  on  all  inputs? 

•  The  language  to  be  decided:  HPALL  =  { p :  p  halts  on  all  inputs}. 


A  procedure  that  could  decide  whether  or  not  a  string  is  in  HPAlL  could  be 
an  important  part  of  a  larger  system  that  proves  the  correctness  of  a  pro¬ 
gram.  Unfortunately,  as  we  will  see  in  Theorem  21.3,  no  such  procedure  can 
exist. 


EXAMPLE  3.4  Primality  Testing 

•  Problem:  Given  a  nonnegative  integer  n,  is  it  prime?  In  other  words,  does  it 
have  at  least  one  positive  integer  factor  other  than  itself  and  1? 

•  An  instance  of  the  problem:  Is  9  prime? 

•  Encoding  of  the  problem:  We  need  a  way  to  encode  each  instance.  We  will 
encode  each  nonnegative  integer  as  a  binary  string. 

•  The  language  to  be  decided:  PRIMES  =  {w :  w  is  the  binary  encoding  of  a 
prime  number}. 


Prime  numbers  play  an  important  role  in  modern  cryptography  systems.  (J.3) 
We’ll  discuss  the  complexity  of  PRIMES  in  Section  28.1.7  and  again  in 
Section  30.2.4. 


EXAMPLE  3.5  Verifying  Addition 

•  Problem:  Verify  the  correctness  of  the  addition  of  two  numbers. 

•  Encoding  of  the  problem:  We  encode  each  of  the  numbers  as  a  string  of  decimal 
digits.  Each  instance  of  the  problem  is  a  string  of  the  form: 

<integer{>  +  <integer2>  =  <integer2>. 
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EXAMPLE  3.5  ( Continued) 

•  The  language  to  be  decided: 

INTEGERSUM  =  {u>  of  the  form:  <integcr\>+  <integeri>  -  <  integer^  : 
each  of  the  substrings  <integerx  >.  <integer2>  and  <integerx>  is  an  element  of 
( 0, 1, 2,3.4, 5, 6. 8. 9 }  and  integer j  is  the  sum  of  integer  x  and  integer-,]. 

•  Examples  of  strings  in  L:  2  +  4  =  6  23  +  47  =  70. 

•  Examples  of  strings  not  in  L:  2  +  4  =  10  2+4. 

EXAMPLE  3.6  Graph  Connectivity 

•  Problem:  Given  an  undirected  graph  G .  is  it  connected?  In  other  words,  given 
any  two  distinct  vertices  .r  and  y  in  G,  is  there  a  path  from  v  to  v? 

•  Instance  of  the  problem:  Is  the  following  graph  connected? 

1 - 2 - 3 

\  \ 

4  5 

•  Encoding  of  the  problem:  Let  V  be  a  set  of  binary  numbers,  one  for  each  ver¬ 
tex  in  G.Then  we  construct  <G>  as  fol low's: 

•  Write  IK  |  as  a  binary  number. 

•  Write  a  list  of  edges,  each  of  which  is  represented  by  a  pair  of  binary  num¬ 
bers  corresponding  to  the  vertices  that  the  edge  connects. 

•  Separate  all  such  binary  numbers  by  the  symbol  /. 

For  example,  the  graph  shown  above  would  be  encoded  by  the  following  siring, 
which  begins  with  an  encoding  of  5  (the  number  of  vertices)  and  is  followed  by 
four  patrs  corresponding  to  the  four  edges: 

101/1/10/10/11/1/100/10/101 . 

•  The  language  to  be  decided: 

CONNECTED  =  { w  e  {0,  1.  /} * :  w  =  «,/«;/ . . .  n, ,  where  each  n,  is  a  binary 
string  and  w  encodes  a  connected  graph,  as  described  above }. 


EXAMPLE  3.7  Protein  Sequence  Alignment 

•  Problem:  Given  a  protein  fragment / and  a  complete  protein  molecule  p,  could 
f  be  a  fragment  from  pi 
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•  Encoding  of  the  problem:  Represent  each  protein  molecule  or  fragment  as  a 
sequence  of  amino  acid  residues.  Assign  a  letter  to  each  of  the  20  possible 
amino  acids.  So  a  protein  fragment  might  be  represented  as  AGHTYWDNR. 

•  The  language  to  be  decided:  {</,  p>  f  could  be  a  fragment  from  p }. 


The  techniques  that  we  will  describe  in  the  rest  of  this  hook  are  widely  used 
in  computational  biology.  (Appendix  K) 


In  each  of  these  examples,  we  have  chosen  an  encoding  that  is  expressive  enough  to 
make  it  possible  to  describe  all  of  the  instances  of  the  problem  we  arc  interested  in.  But 
have  we  chosen  a  good  encoding?  Might  there  be  another  one?  The  answer  to  this  sec¬ 
ond  question  is  yes.  And  it  will  turn  out  that  the  encoding  wc  choose  may  have  a  signif¬ 
icant  impact  on  what  we  can  say  about  the  difficulty  of  solving  the  original  problem. To 
see  an  example  of  this,  we  need  look  no  farther  than  the  addition  problem  that  we  just 
considered.  Suppose  that  we  want  to  write  a  program  to  examine  a  string  in  the  addition 
language  that  we  proposed  above.  Suppose  further  that  we  impose  the  constraint  that 
our  program  reads  the  string  one  character  at  a  time,  left  to  right.  It  has  only  a  finite 
(hounded  in  advance,  independent  of  the  length  of  the  input  string)  amount  of  memory. 
These  restrictions  correspond  to  the  notion  of  a  finite  state  machine,  as  we  will  see  in 
Chapter  5.  It  turns  out  that  no  machine  of  this  sort  can  decide  the  language  that  we  have 
described.  We'll  see  how  to  prove  results  such  as  this  in  Chapter  8. 

But  now  consider  a  different  encoding  of  the  addition  problem. This  time  we  encode 
each  of  the  numbers  as  a  binary  string,  and  we  write  the  digits,  from  lowest  order  to 
highest  order,  left  to  right  (i.e..  backwards  from  the  usual  way).  Furthermore,  we  imag¬ 
ine  the  three  numbers  aligned  in  the  way  they  often  are  when  we  draw  an  addition 
problem.  So  we  might  encode  10  +  4  =  14  as: 

0101  writing  1010  backwards 

+0010  writing  0100  backwards 

0111  writing  1110  backwards 

We  now  encode  each  column  of  that  sum  as  a  single  character.  Since  each  column  is 
a  sequence  of  three  binary  digits,  it  may  take  on  any  one  of  8  possihle  values.  We  can 
use  the  symbols  a,b,c,d.e,f.g.and  h  to  correspond  to  (XX).  00 1.0 10, 011, 100. 101,110, 
and  1 1 1.  respectively.  So  we  could  encode  the  10  +  4  =  14  example  as  afdf. 

It  is  easy  to  design  a  program  that  reads  such  a  siring,  left  to  right,  and  decides,  as 
each  character  is  considered,  whether  the  sum  so  far  is  correct.  For  example,  if  the  first 
character  of  a  string  is  c,  then  the  sum  is  wrong,  since  0  +  1  cannot  be  0  (although  it 
could  be  later  if  there  were  a  carry  bit  from  the  previous  column). 


This  idea  is  the  basis  for  the  design  of  binary  adders,  as  well  as  larger  circuits, 
like  multipliers,  that  exploit  them.  (P.3) 
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In  Pari  V  of  this  book  we  will  be  concerned  with  the  efficiency  (slated  in  terms  of 
either  time  or  space)  of  the  programs  lhat  we  write.  We  will  describe  bolh  time  and 
space  requirements  as  functions  of  the  length  of  the  program  s  input.  When  we  do  lhat. 
it  may  matter  what  encoding  scheme  we  have  picked  since  some  encodings  produce 
longer  strings  than  others  do.  For  example,  consider  the  integer  25.  It  can  be  encoded: 

•  In  decimal  as:  25. 

•  In  binary  as:  11001,  or 

•  In  unary  as:  1111111111111111111111111. 

Well  return  to  this  issue  in  Section  27.3.1. 

3.2.2  Casting  Problems  as  Decision  Questions 

Problems  that  are  not  already  stated  as  decision  questions  can  be  transformed  into  de¬ 
cision  questions.  More  specifically,  they  can  be  reformulated  so  that  they  become  lan¬ 
guage  recognition  problems.  The  idea  is  to  encode,  into  a  single  string,  both  the  inputs 
and  the  outputs  of  the  original  problem  P.  So,  for  example,  if  P  takes  two  inputs  and 
produces  one  result,  we  could  construct  strings  of  the  form  i i<  r.  Then  a  string 
s  =  x\ y :  z  is  in  the  language  L  that  corresponds  to  P  iff  z  is  the  result  that  P  produces 
given  the  inputs  x  and  y. 


EXAMPLE  3.8  Casting  Addition  as  Decision 

•  Problem:  Given  two  nonnegative  integers,  compute  their  sum. 

•  Encoding  of  the  problem:  We  transform  the  problem  of  adding  two  numbers 
into  the  problem  of  checking  to  see  whether  a  third  number  is  the  sum  of  the 
first  two.  We  can  use  the  same  encoding  that  we  used  in  Example  3.5. 

•  The  language  to  be  decided: 

INTEGERSUM  =  (w  of  the  form:  <integerl>^<integer2>=<iiufgeri>.  where 
each  of  the  substrings  <integer^>,  <inieger2> ,  and  <integer^>  is  an  element  of 
|0, 1, 2, 3, 4, 5, 6, 7, 8, 9}+  and  integer 3  is  the  sum  of  integer ,  and  integer2\. 


EXAMPLE  3.9  Casting  Sorting  as  Decision 

•  Problem:  Given  a  list  of  integers,  sort  it. 

•  Encoding  of  the  problem:  We  transform  the  problem  of  sorting  a  list  into  the 
problem  of  examining  a  pair  of  lists  and  deciding  whether  the  second  corre¬ 
sponds  to  the  sorted  version  of  the  first. 
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•  The  language  to  be  decided: 

L  =  {tuj  #  w2  :  —  1  (wj  is  of  the  form  int\ ,  int2, . . .  intn, 

w2  is  of  the  form  int\,  int2, . . .  int,„  and 
w2  contains  the  same  objects  as  and  w2  is  sorted)}. 

•  Example  of  a  string  in  L:  l,5,3,9,6#lf3,5,6,9. 

•  Example  of  a  string  not  in  L:  lt5,3,9,6#l,2,3,4I5f6,7. 


EXAMPLE  3.10  Casting  Database  Querying  as  Decision 

•  Problem:  Given  a  database  and  a  query,  execute  the  query  against  the  data¬ 
base. 

•  Encoding  of  the  problem:  We  transform  the  task  of  executing  the  query  into 
the  problem  of  evaluating  a  reply  to  see  if  it  is  correct. 

•  The  language  to  be  decided: 

L  =  {d  H  q#  a:d  is  an  encoding  of  a  database, 

q  is  a  string  representing  a  query,  and 
a  is  the  correct  result  of  applying  q  tod}. 

•  Example  of  a  string  in  L : 

(name,  age,  phone),  (John,  23,  567-1234)  (Mary,  24,  234-9876  )# 
(select  name  age-23)  # 

(John) . 


Given  each  of  the  problems  that  we  have  just  considered,  there  is  an  important 
sense  in  which  the  encoding  of  the  problem  as  a  decision  question  is  equivalent  to  the 
original  formulation  of  the  problem:  Each  can  be  reduced  to  the  other.  We’ll  have  a  lot 
more  to  say  about  the  idea  of  reduction  in  Chapter  21.  But,  for  now,  what  we  mean  by 
reduction  of  one  problem  to  another  is  that,  if  we  have  a  program  to  solve  the  second, 
we  can  use  it  to  build  a  program  to  solve  the  first.  For  example,  suppose  that  we  have  a 
program  P  that  adds  a  pair  of  integers.  Then  the  following  program  decides  the  lan¬ 
guage  1NTEGERSUM,  which  we  described  in  Example  3.8: 

Given  a  string  of  the  form  <integerl>+<inieger2>=<integer3>  do: 

1.  Let  x  =  convert-to-in\eger(<integer\>). 

2.  Let  y  =  con vert-to-integer (<inieger2>). 

3.  Let  z  =  P{x,  y). 

4.  If  z  =  con vert-to-in teger  ( < in teger^ > )  then  accept.  Else  reject. 
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Alternatively,  if  we  have  a  program  T  that  decides  INTEGERSUM.  then  the  follow¬ 
ing  program  computes  the  sum  of  two  integers  x  and  y: 

1.  Lexicographically  enumerate  the  strings  that  represent  decimal  encodings  of 
nonnegative  integers. 

2.  Each  time  a  string  s  is  generated,  create  the  new  string  <.t>+  <y>-s. 

3.  Feed  that  string  to  T. 

4.  If  T accepts  <x >+<y>=s,  hall  and  return  converMo-integer{x), 


3.3  A  Machine-Based  Hierarchy  of  Language  Classes 

In  Parts  II,  III,  and  IV,  we  will  define  a  hierarchy  of  computational  models,  each  more 
powerful  than  the  last.  The  first  model  is  simple:  Programs  written  for  it  are  generally 
easy  to  understand,  they  run  in  linear  time,  and  algorithms  exist  to  answer  almost  any 
question  we  might  wish  to  ask  about  such  programs. The  second  model  is  more  powerful, 
but  still  limited. The  last  model  is  powerful  enough  to  describe  anything  that  can  he  com¬ 
puted  by  any  sort  of  real  computer.  All  of  these  models  will  allow  us  to  write  programs 
whose  job  is  to  accept  some  language  L.  In  this  section,  we  sketch  this  machine  hierarchy 
and  provide  a  short  introduction  to  the  language  hierarchy  that  goes  along  with  it. 


3.3.1  The  Regular  Languages 

The  first  model  we  will  consider  is  the  finite  state  machine  or  FSM.  Figure  3. 1  shows  a 
simple  FSM  that  accepts  strings  of  a’s  and  b's.  where  all  a’s  come  before  all  b’s. 

The  input  to  an  FSM  is  a  string,  which  is  fed  to  it  one  character  at  a  time,  left  to  right. 
The  FSM  has  a  start  state,  shown  in  the  diagram  with  an  unlabelled  arrow  leading  to  it, 
and  some  numher  (zero  or  more)  of  accepting  states  which  will  he  shown  in  our  dia¬ 
grams  with  double  circles.  The  FSM  starts  in  its  start  state.  As  each  character  is 
read,  the  FSM  changes  stale  based  on  the  transitions  shown  in  the  figure.  If  an  FSM 
M  is  in  an  accepting  state  after  reading  the  last  character  of  some  input  string  .r, 
then  M  accepts  s.  Otherwise  it  rejects  it.  Our  example  FSM  slays  in  state  1  as  long  as  it 
is  reading  a's.When  it  sees  a  b.  it  moves  to  state  2,  where  it  stays  as  long  as  it  continues 
seeing  b’s.  Both  state  1  and  state  2  are  accepting  states.  But  if,  in  state  2.  it  sees  an  a,  it 
goes  to  state  3,  a  nonaccepting  state,  where  it  stays  until  it  runs  out  of  input.  So.  for  ex¬ 
ample,  this  machine  will  accept  aab,  aabbb,  and  bb.  It  will  reject  ba. 


a 


b  FIGURE  3.1  A  simple  FSM. 
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We  will  call  the  class  of  languages  that  can  be  accepted  by  some  FSM  regular.  As  we 
will  see  in  Part  11.  many  useful  languages  are  regular,  including  binary  strings  with  even 
parity,  syntacticully  well-formed  floating  point  numbers,  and  sequences  of  coins  that 
are  sufficient  to  buy  a  soda. 


3.3.2  The  Context-Free  Languages 

But  there  are  useful  simple  languages  that  are  not  regular.  Consider,  for  example.  Bal, 
the  language  of  balanced  parentheses.  Bal  contains  strings  like  (())  and  Q  ();  it  does 
not  contain  strings  like  ()))(.  Because  it’s  hard  to  read  strings  of  parentheses,  let's 
consider  instead  the  related  language  AnBn  =  {a"b"  :  n  ^  0}.  In  any  string  in  AnBn, 
all  the  a's  come  first  and  the  number  of  a's  equals  the  number  of  b's.  We  could  try  to 
build  an  FSM  to  accept  AnBn.  But  the  problem  is.  “How  shall  we  count  the  a's  so  that 
we  can  compare  them  to  the  b's?”  The  only  memory  in  an  FSM  is  in  the  slates  and  we 
must  choose  a  fixed  number  of  states  when  we  build  our  machine.  But  there  is  no 
bound  on  the  number  of  a's  we  might  need  to  count.  We  will  prove  in  Chapter  8  that  it 
is  not  possible  to  build  an  FSM  to  accept  AnBn. 

But  languages  like  Bal  and  AnB"  are  important.  For  example,  almost  every  pro¬ 
gramming  language  and  query  language  allows  parentheses,  so  any  front  end  for  such  a 
language  must  be  able  to  check  to  see  that  the  parentheses  are  balanced.  Can  we  aug¬ 
ment  the  FSM  in  a  simple  way  and  thus  be  able  to  solve  this  problem  ?  The  answer  is 
yes.  Suppose  that  we  add  one  thing,  a  single  slack.  We  will  call  any  machine  that  con¬ 
sists  of  an  FSM,  plus  a  single  stack,  a  pushdown  automaton  or  PDA. 

We  can  easily  build  a  PDA  M  to  accept  AnBn.  The  idea  is  that,  each  time  it  sees  an  a,  M 
will  push  it  onto  the  stack.  Then,  each  time  it  sees  a  b,  it  will  pop  an  a  from  the  stack.  If  it 
runs  out  of  input  and  stack  at  the  same  time  and  it  is  in  an  accepting  state,  it  will  accept. 
Otherwise,  it  will  reject.  M  will  use  the  same  state  structure  that  we  used  in  our  FSM  ex¬ 
ample  above  to  guarantee  that  all  the  a's  come  before  all  the  b's.  In  diagrams  of  PDAs, 
read  an  arc  label  of  the  form  xlylz  to  mean,  “if  the  input  is  an  x.  and  it  is  possible  to  pop  y 
off  the  stack,  then  lake  the  transition,  do  the  pop  of  y.  and  push  z".  If  the  middle  argument 
is  e,  then  don’t  bother  to  check  the  stack.  If  the  third  argument  is  e,  then  don't  push  any¬ 
thing.  Using  those  conventions,  the  PDA  shown  in  Figure  3.2  accepts  AnBn. 

Using  a  very  similar  sort  of  PDA.  we  can  build  a  machine  to  accept  Bal  and  other 
languages  whose  strings  are  composed  of  properly  nested  substrings.  For  example,  a 
palindrome  is  a  string  that  reads  the  same  righl-to-left  as  it  does  left-to  right.  We  can 
easily  build  a  PDA  to  accept  the  language  PalEven  =  {irwR :  we  {a,  b}*}.  the 


a /e/a 


b/a/e 


FIGURE  3.2  A  simple  PDA  that  accepts  AnBn. 


a ,  b/e/e 


3 
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language  of  even-length  palindromes  of  a‘s  and  b's.The  PDA  for  PalEven  simply  push¬ 
es  all  the  characters  in  the  first  half  of  its  input  string  onto  the  stack,  guesses  where  the 
middle  is*  and  then  starts  popping  one  character  for  each  remaining  input  character.  If 
there  is  a  guess  that  causes  the  pushed  string  (which  will  be  popped  off  in  reverse 
order)  to  match  the  remaining  input  string,  then  the  input  string  is  in  PalEven. 

But  we  should  note  some  simple  limitations  to  the  power  of  the  PDA.  Consider  the 
language  WW  =  {ww :  u>e  {a.  b}*}.  which  is  just  like  PalEven  except  that  the  second 
half  of  each  of  its  strings  is  an  exact  copy  of  the  first  half  (rather  than  the  reverse  of  it). 
Now.  as  we’ll  prove  in  Chapter  13.  it  is  not  possible  to  build  an  accepting  PDA  (al¬ 
though  it  would  be  possible  to  build  an  accepting  machine  if  we  could  augment  the  fi¬ 
nite  state  controller  with  a  first-in,  first-out  queue  rather  than  a  stack). 

We  will  call  the  class  of  languages  that  can  be  accepted  by  some  PDA  context-free. 
As  we  will  see  in  Part  111,  many  useful  languages  are  context-free,  including  most  pro¬ 
gramming  languages,  query  languages,  and  markup  languages. 

3.3.3  The  Decidable  and  Semidecidable  Languages 

But  there  are  useful  straightforward  languages  that  are  not  context-free.  Consider,  for 
example,  the  language  of  English  sentences  in  which  some  word  occurs  more  than 
once.  As  an  even  simpler  (although  probably  less  useful)  example,  consider  another 
language  to  which  we  will  give  a  name.  Let  A"BnCn  =  {a',b"c" :  n  ^  0}.  i.e.,  the  lan¬ 
guage  composed  of  all  strings  of  a’s,  b’s,  and  c’s  such  that  all  the  a‘s  come  first,  followed 
by  all  the  b’s,  then  all  the  c’s,  and  the  number  of  a's  equals  the  number  of  b's  equals  the 
number  of  c’s.  We  could  try  to  build  a  PDA  to  accept  A"BnC”.  We  could  use  the  stack 
to  count  the  a’s,  just  as  we  did  for  A"Bn.  We  could  pop  the  stack  as  the  b’s  come  in  and 
compare  them  to  the  a’s.  But  then  what  shall  we  do  about  the  c’s?  We  have  lost  all  in¬ 
formation  about  the  a’s  and  the  b’s  since,  if  they  matched,  the  stack  will  be  empty.  We 
will  prove  in  Chapter  13  that  it  is  not  possible  to  build  a  PDA  to  accept  A'lB"Cn. 

But  it  is  easy  to  write  a  program  to  accept  AnBnCn.  So.  if  we  want  a  class  of  machines 
that  can  capture  everything  we  can  write  programs  to  compute,  we  need  a  model  that 
is  stronger  than  the  PDA. To  meet  this  need,  we  will  introduce  a  third  kind  of  machine. 
We  will  get  rid  of  the  stack  and  replace  it  with  an  infinite  tape. The  tape  will  have  a  sin¬ 
gle  read/write  head.  Only  the  tape  square  under  the  read/write  head  can  be  accessed 
(for  reading  or  for  writing). The  read/write  head  can  be  moved  one  square  in  either  di¬ 
rection  on  each  move.  The  resulting  machine  is  called  a  Turing  machine.  We  will  also 
change  the  way  that  input  is  given  to  the  machine.  Instead  of  streaming  it,  one  charac¬ 
ter  at  a  time,  the  way  we  did  for  FSMs  and  PDAs,  we  will  simply  write  the  input  string 
onto  the  tape  and  then  start  the  machine  with  the  read/wrile  head  just  to  the  left  of  the 
first  input  character.  We  show  the  structure  of  a  Turing  machine  in  Figure  3.3.  The 
arrow  under  the  tape  indicates  the  location  of  the  read/write  head. 

At  each  step,  a  Turing  machine  M  considers  its  current  slate  and  the  character  that  is 
on  the  tape  directly  under  its  read/write  head.  Based  on  those  two  things,  it  chooses  its 
next  state,  chooses  a  character  to  write  on  the  tape  under  the  read/wrile  heud.  und  choos¬ 
es  whether  to  move  the  read/write  head  one  square  to  the  right  or  one  square  to  the  left. 
A  finite  segment  of  M’s  tape  contains  the  input  siring. The  rest  is  blank,  hut  M  may  move 
the  read/write  head  off  the  input  string  and  write  on  the  blank  squares  of  the  tape. 
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FIGURE  3J  The  structure  of  a  Tiring  machine. 


There  exists  a  simple  Turing  machine  that  accepts  A"BnCn.  It  marks  off  the 
leftmost  a,  scans  to  the  right  to  find  a  b,  marks  it  off.  continues  scanning  to  the  right, 
finds  a  c,  and  marks  it  off.  Then  it  goes  back  to  the  left,  marks  off  the  next  a,  and  so 
forth.  When  it  runs  out  of  a’s,  it  makes  one  final  pass  to  the  right  to  make  sure  that 
there  are  no  extra  b’s  or  c’s.  If  that  check  succeeds,  the  machine  accepts.  If  it  fails,  or  if 
at  any  point  the  machine  failed  to  find  a  required  b  or  c,  it  rejects.  For  the  details  of 
how  this  machine  operates,  see  Example  17.8. 

Finite  state  machines  and  pushdown  automata  (with  one  technical  exception  that 
we  can  ignore  for  now)  are  guaranteed  to  halt.  They  must  do  so  when  they  run  out  of 
input.  Turing  machines,  on  the  other  hand,  carry  no  such  guarantee.  The  input  simply 
sits  on  the  tape.  A  Tiring  machine  may  (and  generally  does)  move  back  and  forth 
across  its  input  many  times.  It  may  move  back  and  forth  forever.  Or  it  may  simply 
move  in  one  direction,  off  the  input  onto  the  blank  tape,  and  keep  going  forever.  Be¬ 
cause  of  its  flexibility  in  using  its  tape  to  record  its  computation,  the  Turing  machine  is 
a  more  powerful  model  than  either  the  FSM  or  the  PDA.  In  fact,  we  will  see  in  Chap¬ 
ter  18  that  any  computation  that  can  be  written  in  any  programming  language  or  run 
on  any  modern  computer  can  be  described  as  a  Turing  machine.  However,  when  we 
work  with  Turing  machines,  we  must  be  aware  of  the  fact  that  they  cannot  be  guaran¬ 
teed  to  halt.  And,  unfortunately  we  can  prove  (as  we  will  do  in  Chapter  19)  that  there 
exists  no  algorithm  that  can  examine  a  Tiring  machine  and  tell  whether  or  not  it  will 
halt  (on  any  one  input  or  on  all  inputs).  This  fundamental  result  about  the  limits  of 
computation  is  known  as  the  undecidabilitv  of  the  halting  problem. 

We  will  use  the  Turing  machine  to  define  two  new  classes  of  languages: 

•  A  language  L  is  decidable  iff  there  exists  a  TUring  machine  M  that  halts  on  all  in¬ 
puts,  accepts  all  strings  that  are  in  L,  and  rejects  all  strings  that  are  not  in  L.  In  other 
words,  M  can  always  say  yes  or  no,  as  appropriate. 

•  A  language  L  is  semidecidable  iff  there  exists  a  Tiring  machine  M  that  accepts  all 
strings  that  are  in  L  and  fails  to  accept  every  string  that  is  not  in  L.  Given  a  string 
that  is  not  in  L,  M  may  reject  or  it  may  loop  forever.  In  other  words,  M  can  recog¬ 
nize  a  solution  and  then  say  yes,  but  it  may  not  know  when  it  should  give  up  look¬ 
ing  for  a  solution  and  say  no. 
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Bal,  AnBn,  PalEven,  WW.  and  AnBnCn  are  all  decidable  languages.  Every  decidable 
language  is  also  semidecidable  (since  the  requirement  for  semidecidability  is  strictly 
weaker  than  the  requirement  for  decidability).  But  there  are  languages  that  are  semi¬ 
decidable  yet  not  decidable.  As  an  example,  consider  L  =  { <p.  w> :  p  is  a  Java  pro¬ 
gram  that  halts  on  input  it)}.  L  is  semidecidable  by  a  Turing  machine  that  simulates  p 
running  on  w.  If  the  simulation  halts,  the  semidecider  can  halt  and  accept.  But,  if  the 
simulation  does  not  halt,  the  semidecider  will  not  be  able  to  recognize  that  it  isn’t  going 
to.  So  it  has  no  way  to  halt  and  reject.  Just  as  there  exists  no  algorithm  that  can  exam¬ 
ine  a  Turing  machine  and  decide  whether  or  not  it  will  halt,  there  is  no  algorithm  to  ex¬ 
amine  a  Java  program  (without  having  to  run  it)  and  make  that  determination.  So  L  is 
semidecidable  but  not  decidable. 


.4  The  Computational  Hierarchy  and  Why  It  Is  Important 

We  have  now  defined  four  language  classes: 

1.  Regular  languages,  which  can  be  accepted  by  some  finite  state  machine. 

2.  Context-free  languages,  which  can  be  accepted  by  some  pushdown  automaton. 

3.  Decidable  (or  simply  D)  languages,  which  can  decided  by  some  Turing  machine 
that  always  halts. 

4.  Semidecidable  (or  SD)  languages,  which  can  be  semidccided  by  some  Turing  ma¬ 
chine  that  halts  on  all  strings  in  the  language. 

Each  of  these  classes  is  a  proper  subset  of  the  next  class,  as  illustrated  in  the  diagram 
shown  in  Figure  3.4. 

As  we  move  outward  in  the  language  hierarchy,  we  have  access  to  tools  with  greater 
and  greater  expressive  power.  So,  for  example,  we  can  define  AnBn  as  a  context-free 
language  but  not  as  a  regular  one.  We  can  define  AnBnCn  as  a  decidable  language  but 
not  as  a  context-free  or  a  regular  one.  This  matters  because  expressiveness  generally 
comes  at  a  price.  The  price  may  be: 

•  Computational  efficiency.  Finite  stale  machines  run  in  time  that  is  linear  in  the 
length  of  the  input  string.  A  general  context-free  parser  based  on  the  idea  of  a 
pushdown  automaton  requires  time  that  grows  as  the  cube  of  the  length  of  the 
input  string.  A  Turing  machine  may  require  time  that  grows  exponentially  (or 
faster)  with  the  length  of  the  input  string. 

•  Decidability :  There  exist  procedures  to  answer  many  useful  questions  about  finite 
state  machines.  For  example,  does  an  FSM  accept  some  particular  string?  Is  an  FSM 
minimal  (i.e.,  is  it  the  simplest  machine  that  does  the  job  it  does)?  Are  two  FSMs 
identical?  A  subset  of  those  questions  can  be  answered  for  pushdown  automata. 
None  of  them  can  be  answered  for  Turing  machines. 

•  Clarity :  There  exist  tools  that  enable  designers  to  draw  and  analyze  finite  state 
machines.  Every  regular  language  can  also  be  described  using  the  (often  very  con¬ 
venient)  regular  expression  pattern  language  that  we  will  define  in  Chapter  6. 
Every  context-free  language,  in  addition  to  being  recognizable  by  some  pushdown 
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FIGURE  3.4  A  hierar¬ 
chy  of  language  classes. 


automaton,  can  (as  we  will  see  in  Chapter  11)  be  described  with  a  context-free 
grammar.  For  many  important  kinds  of  languages,  context-free  grammars  are  suffi¬ 
ciently  natural  that  they  are  commonly  used  as  documentation  tools.  No  correspon¬ 
ding  tools  exist  for  the  broader  classes  of  decidable  and  semidecidable  languages. 

So,  as  a  practical  as  well  as  a  theoretical  matter,  it  makes  sense,  given  a  particular  prob¬ 
lem,  to  describe  it  using  the  simplest  (i.e.,  expressively  weakest)  formalism  that  is  ade¬ 
quate  to  the  job. 

The  Rule  of  Least  Power5:  “Use  the  least  powerful  language  suitable  for  expressing 
information,  constraints  or  programs  on  the  World  Wide  Web.” 

Although  stated  in  the  context  of  the  Worldwide  Web,  the  Rule  of  Least  Power  ap¬ 
plies  far  more  broadly.  We’re  appealing  to  a  generalization  of  it  here.  We’ll  return  to  a 
discussion  of  it  in  the  specific  context  of  the  Semantic  Web  in  1.3. 

In  Parts  II,  111,  and  IV  of  this  book,  we  explore  the  language  hierarchy  that  we  have 
just  defined.  We  will  start  with  the  smallest  class,  the  regular  languages,  and  move  out¬ 
wards. 


'Quoted  from  (Berners-Lee  and  Mendelsohn  2006]. 


34  Chapter  3  The  Big  Picture:  A  Language  Hierarchy 


3.4  A  Tractability  Hierarchy  of  Language  Classes 

The  decidable  languages,  as  defined  above,  are  those  that  can.  in  principle ,  he  decided. 
Unfortunately,  in  the  case  of  some  of  them,  any  procedure  that  can  decide  whether  or 
not  a  string  is  in  the  language  may  require,  on  reasonably  large  inputs,  more  lime  steps 
than  have  elapsed  since  the  Big  Bang.  So  it  makes  sense  to  take  another  look  at  the 
class  of  decidable  languages,  this  time  from  the  perspective  of  the  resources  (lime, 
space,  or  both)  that  may  be  required  by  the  best  decision  procedures  we  can  construct. 
We  will  do  that  in  Part  V.  So.  for  example,  we  will  define  the  classes: 

•  P,  which  contains  those  languages  that  can  be  decided  in  time  that  grows  as  some 
polynomial  function  of  the  length  of  the  input, 

•  NP.  which  contains  those  languages  that  can  be  decided  by  a  nondeterministic  ma¬ 
chine  (one  that  can  conduct  a  search  by  guessing  which  move  to  make)  with  the 
property  that  the  amount  of  time  required  to  explore  one  sequence  of  guesses  (one 
path)  grows  as  some  polynomial  function  of  the  length  of  the  input,  and 

•  l*SPACE.  which  contains  those  languages  that  can  be  decided  by  a  machine  whose 
space  requirement  grows  as  some  polynomial  function  of  the  length  of  the  input. 

These  classes,  like  the  ones  that  we  defined  in  terms  of  particular  kinds  of  machines, 
can  be  arranged  in  a  hierarchy.  For  example,  it  is  the  case  that: 

PCNPCPSPACE 

Unfortunately,  as  we  will  see,  less  is  known  about  the  structure  of  this  hierarchy  than 
about  the  structure  of  the  hierarchy  we  drew-  in  the  last  section.  For  example,  perhaps 
the  biggest  open  question  of  theoretical  computer  science  is  whether  P  =  NP.  It  is  pos¬ 
sible,  although  generally  thought  to  be  very  unlikely,  that  every  language  that  is  in  NP 
is  also  in  P.  For  this  reason,  we  won’t  draw  a  picture  here.  Any  picture  we  could  draw 
might  suggest  a  situation  that  will  eventually  turn  out  not  to  be  true. 

Exercises 

1.  Consider  the  following  problem:  Given  a  digital  circuit  C.  does  C  output  1  on  all 
inputs?  Describe  this  problem  as  a  language  to  be  decided. 

2.  Using  the  technique  we  used  in  Example  3.8  to  describe  addition,  describe  square 
root  as  a  language  recognition  problem. 

3.  Consider  the  problem  of  encrypting  a  password,  given  an  encryption  key.  Formu¬ 
late  this  problem  as  a  language  recognition  problem. 

4.  Consider  the  optical  character  recognition  (OCR)  problem:  Given  an  array  of 
black  and  white  pixels  and  a  set  of  characters,  determine  which  character  best 
matches  the  pixel  array.  Formulate  this  problem  as  a  language  recognition 
problem. 

5.  Consider  the  language  A"BnC"  =  {a''bV' :  n  >  ft),  discussed  in  Section  3.3.3. 
We  might  consider  the  fallowing  design  for  a  PDA  to  accept  A"BnCn:  As  each  a 
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is  read,  push  two  a’s  onto  the  stack. Then  pop  one  a  for  each  b  and  one  a  for  each 
c.  If  the  input  and  the  stack  come  out  even,  accept.  Otherwise  reject.  Why  doesn’t 
this  work? 

6.  Define  a  PDA-2  to  be  a  PDA  with  two  stacks  (instead  of  one).  Assume  that  the 
stacks  can  be  manipulated  independently  and  that  the  machine  accepts  iff  it  is 
in  an  accepting  state  and  both  stacks  are  empty  when  it  runs  out  of  input.  De¬ 
scribe  the  operation  of  a  PDA-2  that  accepts  AnB"Cn  =  {a”bnc":n  s  0}. 
(Note:  We  will  see,  in  Section  17.5.2,  that  the  PDA -2  is  equivalent  to  the  Tbring 
machine  in  the  sense  that  any  language  that  can  be  accepted  by  one  can  be  ac¬ 
cepted  by  the  other.) 


CHAPTER  4 


Computation 


Our  goal  in  this  book  is  to  be  able  to  make  useful  claims  about  problems  and  the 
programs  that  solve  them.  Of  course,  both  problem  specifications  and  the  pro¬ 
grams  that  solve  them  take  many  different  forms.  Specifications  can  be  written 
in  English,  or  as  a  set  of  logical  formulas,  or  as  a  set  of  inpui/outpul  pairs.  Programs  can 
be  written  in  any  of  a  wide  array  of  common  programming  languages.  As  we  said  in  the 
last  chapter,  in  this  book  we  are,  for  the  most  part. going  to  depart  from  those  standard 
methods  and,  instead 

•  Define  problems  a  languages  to  be  decided,  and 

•  Define  programs  as  stale  machines  whose  input  is  a  string  and  whose  output  is 
Accept  or  Reject. 

Both  because  of  this  change  in  perspective  and  because  we  are  going  to  introduce 
two  ideas  that  are  not  common  in  everyday  programming  practice,  we  will  pause,  in 
this  chapter,  and  look  at  what  we  mean  by  computation  and  how  we  are  going  to  go 
about  it.  In  particular,  we  will  examine  three  key  ideas: 

1.  Decision  procedures. 

2.  Nondcicrminism. 

3.  Functions  on  languages  (alternatively,  programs  that  operate  on  other  programs). 

Once  we  have  finished  this  discussion,  we  will  begin  our  examination  of  the  language 
classes  that  we  outlined  in  Chapter  3. 


4.1  Decision  Procedures 

Recall  that  a  decision  problem  is  one  for  which  we  must  make  a  yes/no  decision.  An 
algorithm  is  a  detailed  procedure  that  accomplishes  some  clearly  specified  task.  A 
decision  procedure  is  an  algorithm  to  solve  a  decision  problem.  Put  another  way.  it  is  a 
program  whose  result  is  a  Boolean  value.  Note  that,  in  order  to  be  guaranteed  to  return 
a  Boolean  value,  a  decision  procedure  must  be  guaranteed  to  halt  on  all  inputs. 
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Tliis  book  is  about  decision  procedures.  We  will  spend  most  of  our  time  discussing 
decision  procedures  to  answer  questions  of  the  form: 

•  Is  string  5  in  language  L? 

But  we  will  also  attempt  to  answer  other  questions,  in  particular  ones  that  ask  about 
the  machines  that  we  will  build  to  answer  the  first  group  of  questions.  So  we  may  ask 
questions  such  as: 

•  Given  a  machine  (an  FSM,  a  PDA,  or  a  Turing  machine),  does  it  accept  any  strings? 

•  Given  two  machines,  do  they  accept  the  same  strings? 

•  Given  a  machine,  is  it  the  smallest  (simplest)  machine  that  does  its  job? 

If  we  have  in  mind  a  decision  problem  to  which  we  want  an  answer,  there  are  three 
things  we  may  want  to  know: 

1.  Does  there  exist  a  decision  procedure  (i.e.,  an  algorithm)  to  answer  the  ques¬ 
tion?  A  decision  problem  is  decidable  iff  the  answer  to  this  question  is  yes.  A  de¬ 
cision  problem  is  undecidable  iff  the  answer  to  this  question  is  no.  A  decision 
problem  is  semidecidable  iff  there  exists  an  algorithm  that  halts  and  returns 
True  iff  True  is  the  answer.  When  False  is  the  answer,  it  may  either  halt  and  re¬ 
turn  False  or  it  may  loop.  Some  undecidable  problems  are  semidecidable;  some 
are  not  even  that. 

2.  If  any  decision  procedures  exist,  find  one. 

3.  Again,  if  any  decision  procedures  exist,  what  is  the  most  efficient  one  and  how  ef¬ 
ficient  is  it? 

In  the  early  part  of  this  book,  we  will  ask  questions  for  which  decision  procedures 
exist  and  we  will  often  skip  directly  to  question  2.  But,  as  we  progress,  we  will  begin  to 
ask  questions  for  which,  provably,  no  decision  procedure  exists.  It  is  because  there  are 
such  problems  that  we  have  articulated  question  1. 

Decision  procedures  are  programs.  They  must  possess  two  correctness  properties: 

1.  The  program  must  be  guaranteed  to  halt  on  all  inputs. 

2.  When  the  program  halts  and  returns  an  answer,  it  must  be  the  correct  answer  for 
the  given  input. 

Let’s  consider  some  examples. 


EXAMPLE  4.1  Checking  for  Even  Numbers 

Is  the  integer  x  even?  This  one  is  easy.  Assume  that  /  performs  (truncating)  integer 
division. Then  the  following  program  answers  the  question: 

even  ( x :  integer) = 

If  (x/2)  *2  =  x  then  return  True  else  return  False. 
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EXAMPLE  4.2  Checking  for  Prime  Numbers 

Is  the  positive  integer  x  prime?  Given  an  appropriate  string  encoding,  this  prob¬ 
lem  corresponds  to  the  language  PRIMES  that  we  defined  in  Example  3.4.  Defin¬ 
ing  a  procedure  to  answer  this  question  is  not  hard,  although  it  will  require  a  loop 
and  so  it  will  be  necessary  to  prove  that  the  loop  always  terminates.  Several  algo¬ 
rithms  that  solve  this  problem  exist.  Here's  an  easy  one: 

prime  (.r:  positive  integer)  = 

For  /  =  2  to  ceiling  (sqrt(x))  do: 

If  (xli)*i  =  x  then  return  False. 

Return  True. 

The  function  ceiling(x),  also  written  \x]  returns  the  smallest  integer  that  is 
greater  than  or  equal  to  x.  This  program  is  guaranteed  to  halt.  Tire  natural  num¬ 
bers  between  0  and  ceiling  (sqrt(x))- 2  form  a  well-ordered  set  under  Let 
index  correspond  to  ceiling  (sqrt(x))-i.  At  the  beginning  of  the  first  pass  through 
the  loop,  the  value  of  index  is  ceiling  (sqrt(x))- 2.  The  value  of  index  decreases  by 
one  each  time  through  the  loop.  The  loop  ends  when  that  value  becomes  0.  It's 
worth  pointing  out  that,  while  this  program  is  simple  and  it  is  easy  to  prove  that  it 
is  correct,  it  is  not  the  most  efficient  program  that  we  could  write.  We’ll  have  more 
to  say  about  this  problem  in  Sections  28.1.7  and  30.2.4. 


For  our  next  few  examples  we  need  a  definition.  The  sequence: 

F„  =  2r  +  In  >  0, 

defines  the  Fermat  numbers  a  .  Tire  first  few  Fermat  numbers  are: 

F„  =  3.  F,  =  5.  F,  =  17.  F3  =  257,  F,  =  65.537.  =  4.294.%7.297. 

EXAMPLE  4.3  Checking  for  Small  Prime  Fermat  Numbers 

Are  there  any  prime  Fermat  numbers  less  than  l.OOO.OtX)?  There  exists  a  simple 
decision  procedure  to  answer  this  question: 

fernwtSntall( )  = 
i  =  0. 

Repeat: 

candidate  =  (2**  (2**/))  +  1. 

If  candidate  is  prime  then  return  True. 
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1  =  1  +  1. 

until  candidate  s  1.000,000. 

Return  False. 

This  algorithm  is  guaranteed  to  halt  because  the  value  of  candidate  increases  each 
time  through  the  loop  and  the  loop  terminates  when  its  value  exceeds  a  fixed 
bound.  We  will  skip  the  proof  that  the  correct  answer  is  returned. 


EXAMPLE  4.4  Checking  for  Large  Prime  Fermat  Numbers 

Are  there  any  prime  Fermat  numbers  greater  than  l, 000, 000?  This  question  is  dif¬ 
ferent  in  one  important  way  from  the  previous  one.  Does  there  exist  a  decision 
procedure  to  answer  this  question?  What  about: 

format  Large  ()= 

I  =  0. 

Repeat: 

candidate  =  (2**  (2  **/'))  +  1. 

If  candidate  >  1.000.000  and  is  prime  then  return  True, 
i  =  i  +  1. 

Return  False. 


What  can  we  say  about  this  program?  If  there  is  a  prime  Fermat  number 
greater  than  1,(XX).(XX),  fermat  Large  will  find  it  and  will  halt.  But  suppose  that 
there  is  no  such  number.  Then  the  program  will  loop  forever.  Fermat  Large  is  not 
capable  of  returning  False  even  if  False  is  the  correct  answer.  So,  is  fermatLurge  a 
decision  procedure?  No.  A  decision  procedure  must  halt  and  return  the  correct 
answer,  whatever  that  is. 

Can  we  do  better?  Is  there  a  decision  procedure  to  answer  this  question?  Yes. 
Since  this  question  lakes  no  arguments,  it  has  a  simple  answer,  either  True  or 
False.  So  either 


format  Yes  ()= 

Return  True, 
or 

fermatNo  ()= 

Return  False. 

correctly  answers  the  question.  Our  problem  now  is, “Which  one?”  No  one  knows. 
Fermat  himself  was  only  able  to  generate  the  first  five  Fermat  numbers,  and,  on 
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EXAMPLE  4.4  ( Continued) 

that  basis,  conjectured  that  all  Fermat  numbers  are  prime.  If  he  had  been  right, 
then  fermaiYes  answers  the  question.  However,  it  now  seems  likely  that  there 
are  no  prime  Fermat  numbers  greater  than  65.537.  A  substantial  effort  S  con¬ 
tinues  to  be  devoted  to  finding  one.  but  so  far  the  only  discoveries  have  been 
larger  and  larger  composite  Fermat  numbers.  But  there  is  also  no  proof  that  a 
larger  prime  one  does  not  exist  nor  is  there  an  algorithm  for  finding  one.  We 
simply  do  not  know. 


EXAMPLE  4.5  Checking  for  Programs  That  Halt  on  a  Particular  Input 

Now  consider  a  problem  that  is  harder  and  that  cannot  be  solved  by  a  simple  con¬ 
stant  function  such  as  fermaiYes  or  femuttNo.  Given  an  arbitrary  Java  program  p 
that  takes  a  string  w  as  an  input  parameter,  does  p  halt  on  some  particular  value 
of  w?  Here's  a  candidate  for  a  decision  procedure: 

haltsOnw  ( p :  program,  w:  string)  = 

1.  Simulate  the  execution  of  p  on  n\ 

2.  If  the  simulation  halls  return  True  else  return  False. 

Is  haltsOnw  a  decision  procedure?  No,  because  it  can  never  return  the  value 
False.  Yet  False  is  sometimes  the  correct  answer  (since  there  are  (/>.  ?«>)  pairs  such 
that  p  fails  to  halt  on  in).  When  haltsOnw  should  return  False .  it  will  loop  forever 
in  step  l.Can  we  do  better?  No.  It  is  possible  to  prove. as  we  will  do  in  Chapter  14J, 
that  no  decision  procedure  for  this  question  exists. 


Define  a  semidecision  procedure  to  be  a  procedure  that  halts  and  returns  True  when¬ 
ever  True  is  the  correct  answer.  But,  whenever  False  is  the  correct  answer,  it  may  return 
False  or  it  may  loop  forever.  In  other  words,  a  semidecision  procedure  knows  when  to 
say  yes  but  it  is  not  guaranteed  to  know  when  to  say  no.  A  semidecidahle  problem  is  a 
problem  for  which  a  semidecision  procedure  exists.  Example  4.5  is  a  semidecidahle 
problem.  While  some  semidecidahle  problems  are  also  decidable,  that  one  isn't. 


EXAMPLE  4.6  Checking  for  Programs  That  Halt  on  All  Inputs 

Now  consider  an  even  harder  problem:  Given  an  arbitrary  Java  program  that 
takes  a  single  string  as  an  input  parameter,  does  it  hall  on  all  possible  input  val¬ 
ues?  Here’s  a  candidate  for  a  decision  procedure: 
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hallsOnAU  ( program )  — 

1.  For  i  =  1  to  infinity  do: 

Simulate  the  execution  of  program  on  all  possible  input 
strings  of  length  /'. 

2.  If  all  of  the  simulations  halt  return  True  else  return  False. 

HallsOnAU  will  never  halt  on  any  program  since,  to  do  so,  it  must  try  running 
the  program  on  an  infinite  number  of  strings.  And  there  is  not  a  better  proce¬ 
dure  to  answer  this  question.  We  will  show,  in  Chapter  21,  that  it  is  not  even 
semidecidable. 


The  bottom  line  is  that  there  are  three  kinds  of  questions: 

•  Those  for  which  a  decision  procedure  exists. 

•  Those  for  which  no  decision  procedure  exists  but  a  semidecision  procedure  exists. 

•  Those  for  which  not  even  a  semi-decision  procedure  exists. 

As  we  move  through  the  language  classes  that  we  will  consider  in  this  book,  we  will 
move  from  worlds  in  which  there  exist  decision  procedures  for  just  about  every  ques¬ 
tion  we  can  think  of  to  worlds  in  which  there  exist  some  decision  procedures  and  per¬ 
haps  some  semidecision  procedures,  all  the  way  to  worlds  in  which  there  do  not  exist 
even  semidecision  procedures. 

But  keep  in  mind  throughout  that  entire  progression  what  a  decision  procedure  is.  It 
is  an  algorithm  that  is  guaranteed  to  halt  on  all  inputs. 


4.2  Determinism  and  Nondeterminism 

Imagine  adding  to  a  programming  language  the  function  choose,  which  may  be  written 
in  either  of  the  following  forms: 

•  choose  (action  1;; 

action  2;; 

action  n) 

•  choose  ( x  from  S:  P{x)) 

In  the  first  form,  choose  is  presented  with  a  finite  list  of  alternatives,  each  of  which 
will  return  either  a  successful  value  or  the  value  False.  Choose  will: 

•  Return  some  successful  value,  if  there  is  one. 

•  If  there  is  no  successful  value,  then  choose  will: 
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•  Halt  and  return  False  if  all  the  actions  halt  and  return  False. 

•  Fail  to  halt  if  any  of  the  actions  fails  to  hall.  Wc  want  to  define  choose  this  way  since 
any  path  that  has  not  halted  still  has  the  potential  to  return  a  successful  value. 

In  the  second  form,  choose  is  presented  with  a  set  S  of  values.  .S'  may  be  finite  or  it 
may  be  infinite  if  it  is  specified  by  a  generator.  Choose  will: 

•  Return  some  element  x  of  S  such  that  P(x)  halls  with  a  value  other  than  False ,  if 

there  is  one. 

•  If  there  is  no  such  clement,  then  choose  will: 

•  Halt  and  return  False  if  it  can  be  determined  that,  for  all  elements  x  of  S,  l\x)  is  not 
satisfied. This  will  happen  if  S  is  finite  and  there  is  a  procedure  for  checking  P  that 
always  halls.  It  may  also  happen,  even  if  .V  is  infinite,  if  there  is  some  way.  short  of 
checking  all  the  elements,  to  determine  that  no  elements  that  satisfy  P  exist. 

•  Fail  to  hall  if  there  is  no  mechanism  for  determining  that  no  elements  of  S  that 
satisfy  P  exist.  This  may  happen  either  because  5  is  infinite  or  because  there  is 
no  algorithm,  guaranteed  to  halt  on  all  inputs,  that  checks  for  P  and  returns 
False  when  necessary. 

In  both  forms,  the  job  of  choose  is  to  find  a  successful  value  (which  we  will  define  to 
be  any  value  other  than  False)  if  there  is  one.  When  we  don't  care  which  successful 
value  we  find  (or  how  we  find  it),  choose  is  a  useful  abstraction,  as  we  will  see  in  the 
next  few  examples. 

We  will  call  programs  that  are  written  in  our  new  language,  which  includes 
choose,  nondeterministic.  Wc  will  call  programs  that  are  written  without  using  choose 
deterministic. 

Real  computers  are.  of  course,  deterministic.  So.  if  choose  is  going  to  be  useful,  there 
must  exist  a  way  to  implement  it  on  a  deterministic  machine.  For  now.  however,  we  will 
be  noncommittal  as  to  how  that  is  done.  It  may  try  the  alternatives  one  at  a  time,  or  it 
may  pursue  them  in  parallel.  If  it  tries  them  one  at  a  time,  it  may  try  them  in  the  order 
listed,  in  some  random  order,  or  in  some  order  that  is  carefully  designed  to  maximize 
the  chances  of  finding  a  successful  value  without  trying  all  the  others. The  only  require¬ 
ment  is  that  it  must  pursue  the  alternatives  in  some  fashion  that  is  guaranteed  to  find  a 
successful  value  if  there  is  one. The  point  of  the  choose  function  is  that  we  can  separate 
the  design  of  the  choosing  mechanism  from  the  design  of  the  program  that  needs  a 
value  and  calls  choose  to  find  it  one. 


EXAMPLE  4.7  Nondeterministically  Choosing  a  Travel  Plan 

Suppose  that  we  regularly  plan  medium  length  trips.  We  are  willing  to  drive  or  to 
fly  and  rent  a  car  or  to  take  a  train  and  use  public  transportation  if  it  is  available 
when  we  get  there,  as  long  as  the  total  cost  of  the  trip  and  the  total  time  required 
are  reasonable.  We  don’t  care  about  small  differences  in  time  or  cost  enough  to 
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make  it  worth  exhaustively  exploring  all  the  options  every  time.  We  can  define  the 
function  trip-plan  to  solve  our  problem: 

trip-plan  (start,  ftnish)= 

Return  (choose  (f I y-major-airline-and-rent-car  (start,  finish );; 

fly-regional-airline-and-rent-car  (start,  finish );; 
take-train-and-use-public-transportation  (start,  finish)',’, 
drive  (start, finish))). 

Each  of  the  four  functions  trip-plan  calls  returns  with  a  successful  value  iff  it 
succeeds  in  finding  a  plan  that  meets  the  cost  and  time  requirements.  Probably 
the  first  three  of  them  are  implemented  as  an  Internet  agent  that  visits  the  appro¬ 
priate  Web  sites,  specifies  the  necessary  parameters,  and  waits  to  see  if  a  solution 
can  be  found.  But  notice  that  trip-plan  can  return  a  result  as  soon  as  at  least  one 
of  the  four  agents  finds  an  acceptable  solution.  It  doesn’t  care  whether  the  four 
agents  can  be  run  in  parallel  or  are  tried  sequentially.  It  just  wants  to  know  if 
there's  a  solution  and,  if  so,  what  it  is. 


A  good  deal  of  the  power  of  choose  comes  from  the  fact  that  it  can  be  called  recur¬ 
sively.  So  it  can  be  used  to  describe  a  search  process,  without  having  to  specify  the  de¬ 
tails  of  how  the  search  is  conducted. 


EXAMPLE  4.8  Nondeterministically  Searching  a  Space  of  Puzzle  Moves 

Suppose  that  we  want  to  solve  the  15-puzzle  H.  We  are  given  two  configurations 
of  the  puzzle,  for  example  the  ones  shown  here  labeled  (a)  and  (b).The  goal  is 
to  begin  in  configuration  (a)  and,  through  a  sequence  of  moves,  reach  configura¬ 
tion  (b).  The  only  allowable  move  is  to  slide  a  numbered  tile  into  the  blank 
square. 
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EXAMPLE  4.8  ( Continued) 

Using  choose,  we  can  easily  write  solve- J 5.  a  program  that  finds  a  solution  if 
there  is  one.  The  idea  is  that  sol re-15  will  guess  at  a  first  move.  From  the  board 
configuration  that  results  from  that  move,  it  will  guess  at  a  second  move.  From 
there,  it  will  guess  at  a  third  move,  and  so  on.  If  it  reaches  the  goal  configuration, 
it  will  report  the  sequence  of  moves  that  got  it  there. 

Using  the  second  form  of  choose  (in  which  values  arc  selected  from  a  set  that 
can  be  generated  each  lime  a  new  choice  must  be  made),  we  can  define  solve- 15 
so  that  it  returns  an  ordered  list  or  board  positions.  The  first  element  of  the  list 
corresponds  to  the  initial  configuration.  Following  that,  in  order,  are  the  configu¬ 
rations  that  result  from  each  of  the  moves. The  final  configuration  will  correspond 
to  the  goal.  So  the  result  of  a  call  to  solve- 15  will  describe  a  move  sequence  that 
corresponds  to  a  solution  to  the  original  problem.  We'll  invoke  solve- 15  with  a  list 
that  contains  just  the  initial  configuration.  So  we  define: 

solve- 1 5  ( position-list ) — 

I *  Explore  moves  available  from  the  last  board  configuration  to  have 
been  generated. 

current  =  last  (position-list). 

IT  current  =  solution  then  return  ( position-list ). 

I*  Assume  that  successors  (current)  returns  the  set  of  configurations 
that  can  be  generated  by  one  legal  move  from  currenl.Tben  choose 
picks  one  with  the  property  that,  once  it  has  been  appended  to 
position-list. solve- 1 5  can  continue  and  find  a  solution.  We  assume 
that  append  destructively  modifies  its  first  argument. 

choose  (x  from  successors  (current):  salve-15  (append  (position-list, x))). 

Return  position-list. 

If  there  is  a  solution  to  a  particular  instance  of  the  15-puzzl c.solve-15  will  find 
it.  If  we  care  about  how  efficiently  the  solution  is  found,  then  we  can  dig  inside  the 
implementation  of  choose  and  try  various  strategies,  including: 

•  Checking  to  make  sure  we  don’t  generate  a  board  position  that  has  already 
been  explored,  or 

•  Sorting  the  successors  by  how  close  they  are  to  the  goal. 

But  if  we  don't  care  about  how  choose  works,  we  don't  have  to. 


15-puzzle  configurations  can  be  divided  into  two  equivalence  classes.  Every 
configuration  can  be  transformed  into  every  other  configuration  in  the  same 
class  and  into  none  of  the  configurations  in  the  other  class  u. 


Many  decision  problems  can  be  solved  straightforwardly  using  elinnse 
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EXAMPLE  4.9  Nondeterministically  Searching  for  a  Satisfying 
Assignment 

A  wff  in  Boolean  logic  is  satisfiable  iff  it  is  true  for  at  least  one  assignment  of 
truth  values  to  the  literals  it  contains.  Now  consider  the  following  problem,  which 
we’ll  call  SAT:  Given  a  Boolean  wff  to,  decide  whether  or  not  to  is  satisfiable. 

To  see  how  we  might  go  about  designing  a  program  to  solve  the  SAT  problem, 
consider  an  example  wff  to  =  PA  (Qv/?)A-i(flvS)  — ►  Q.  We  can  build  a  program 
that  considers  the  predicate  symbols  (in  this  case  P ,  Q,  R,  and  S)  in  some  order. 
For  each  one,  it  will  pick  one  of  the  two  available  values.  True  or  False,  and  assign 
it  to  all  occurrences  of  that  predicate  symbol  in  to.  When  no  predicate  symbols 
remain,  all  that  is  necessary  is  to  use  the  truth  table  definitions  of  the  logical  oper¬ 
ators  to  simplify  to  until  it  has  evaluated  to  either  True  or  False.  If  True,  then  we 
have  found  an  assignment  of  values  to  the  predicates  that  makes  to  true;  to  is  sat¬ 
isfiable.  If  False,  then  this  path  fails  to  find  such  an  assignment  and  it  fails.  This 
procedure  must  halt  because  to  contains  only  a  finite  number  of  predicate  sym¬ 
bols,  one  is  eliminated  at  each  step,  and  there  are  only  two  values  to  choose  from 
at  each  step.  So  either  some  path  will  return  True  or  all  paths  will  eventually  halt 
and  return  False. 

The  following  algorithm  returns  True  if  the  answer  to  the  question  is  yes  and 
False  if  the  answer  to  the  question  is  no: 

decides  AT  (to :  Boolean  wff)= 

If  there  are  no  predicate  symbols  in  to  then: 

Simplify  to  until  it  is  either  True  or  False . 

Return  to. 

Else: 

Find  P,  the  first  predicate  symbol  in  to. 

/*  Let  w/Plx  mean  the  wff  to  with  every  instance  of  P  replaced 
by  jc. 

Return  choose  (decideS AT  (to/ PI True)\\ 
decideSAT  (wl  PI  False)). 


One  way  to  envision  the  execution  of  a  program  like  solve- 15  or  decideSAT  is  as  a 
search  tree.  Each  node  in  the  tree  corresponds  to  a  snapshot  of  so/ve-15  or  decideSAT 
and  each  path  from  the  root  to  a  leaf  node  corresponds  to  one  computation  that  solve- 
15  or  decideSAT  might  perform.  For  example,  if  we  invoice  decideSAT  on  the  input 
Ph-iR,  the  set  of  possible  computations  can  be  described  by  the  tree  in  Figure  4.1. The 
first  level  in  the  tree  corresponds  to  guessing  a  value  for  P  and  the  second  level  corre¬ 
sponds  to  guessing  a  value  for  R. 
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Pa—./? 


True  a -.True  True*  —iFulstt  Falser.  -.True  Falser.-. False 

II  II 

False  True  False  False 

FIGURE  4.1  A  search  tree  created  by  decideS AT  on  the  input  /JA 

Since  there  exists  at  least  one  computational  path  that  succeeds  (i.e.,  returns  a  value 
other  than  False),  decideS  AT  will  pick  the  value  returned  hy  one  such  path  and  return 
it.  So  decideSAT  will  return  True.  It  may  do  so  after  exploring  all  four  of  the  paths 
shown  above  (if  it  is  unlucky  choosing  an  order  in  which  to  explore  the  paths).  Or  it 
may  guess  correctly  and  find  the  successful  path  without  considering  any  of  the  others. 


Efficient  algorithms  for  solving  Boolean  satisfiability  problems  are  impor¬ 
tant  in  a  wide  variety  of  domains.  No  general  and  efficient  algorithms  are 
known.  But.  in  B.13.  we'll  describe  ordered  binary  decision  diagrams 
(OBDDs),  which  are  used  in  SAT  solvers  that  work,  in  practice,  substantially 
more  efficiently  than  decideSAT  does. 


One  of  the  most  important  properties  of  programs  that  exploit  choose  is  clear  from 
the  simple  tree  that  we  just  examined:  Guesses  that  do  not  lead  to  a  solution  can  be  ef¬ 
fectively  ignored  in  any  analysis  that  is  directed  at  determining  the  program’s  result. 

Does  adding  choose  to  our  programming  language  let  us  solve  any  problems  that  we 
couldn’t  solve  without  it?  The  answer  to  that  question  turns  out  to  depend  on  what  else 
the  programming  language  already  lets  us  do. 

Suppose,  for  example,  that  we  are  describing  our  programs  as  finite  stale  machines 
(FSMs).  One  way  to  add  choose  to  the  FSM  model  is  It)  allow  two  or  more  transitions, 
labeled  with  the  same  input  character,  to  emerge  from  a  single  state.  We  show  a  simple 
example  in  Figure  4.2. 

We’ll  say  that  a  nondeterministic  FSM  M  (i.e..  one  that  may  exploit  choose)  accepts 
iff  at  least  one  of  its  paths  accepts.  It  will  reject  iff  all  of  its  paths  reject.  So  M’s  job  is  to 


FIGURE  4^  A  nondeterministic  FSM  with  two 
competing  transitions  labeled  a. 
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find  an  accepting  path  if  there  is  one.  If  it  succeeds,  it  can  ignore  all  other  paths.  If  M  ex¬ 
ploits  choose  and  does  contain  competing  transitions,  then  one  way  to  view  its  behav¬ 
ior  is  that  it  makes  a  guess  and  chooses  an  accepting  path  if  it  can. 

While  we  will  find  it  very  convenient  to  allow  nondeterminism  like  this  in  finite 
state  machines,  we  will  see  in  Section  5.4  that,  whenever  there  is  a  nondeterministic 
FSM  to  accept  some  language  L,  there  is  also  a  (possibly  much  larger  and  more  com¬ 
plicated)  deterministic  FSM  that  accepts  L.  So  adding  choose  doesn’t  change  the  class 
of  languages  that  can  be  accepted. 

Now  suppose  that  we  are  describing  our  programs  as  pushdown  automata  (PDAs). 
Again  we  will  add  choose  to  the  model  by  allowing  competing  transitions  coming  out 
of  a  state.  As  we  will  see  in  Chapter  13,  now  the  answer  is  that  adding  choose  adds 
power.  There  are  languages  that  can  be  accepted  by  PDAs  that  exploit  choose  that  can¬ 
not  be  accepted  by  any  PDA  that  does  not  exploit  it. 

Lastly,  suppose  that  we  are  describing  our  programs  as  Turing  machines  or  as  code  in  a 
standard,  modem  programming  language. Then,  as  we  will  see  in  Chapter  17,  we  are  back 
to  the  situation  we  were  in  with  FSMs.  Nondeterminism  is  a  very  useful  design  tool  that 
lets  us  specify  complex  programs  without  worrying  about  the  details  of  how  the  search  is 
managed.  But,  if  there  is  a  nondeterministic  Turing  machine  that  solves  a  problem,  then 
there  is  a  deterministic  one  (one  that  does  not  exploit  choose)  that  also  solves  the  problem. 

In  the  two  cases  (FSMs  and  Turing  machines)  in  which  adding  choose  does  not  add 
computational  power  to  our  model,  we  will  see  that  it  does  add  descriptive  power. 
We’ll  see  examples  for  which  a  very  simple  nondeterministic  machine  can  do  the  work 
of  a  substantially  more  complex  deterministic  one.  We'll  present  algorithms,  for  both 
FSMs  and  Turing  machines,  that  construct,  given  an  arbitrary  nondeterministic  ma¬ 
chine,  an  equivalent  deterministic  one. Thus  we  can  use  nondeterminism  as  an  effective 
design  tool  and  leave  the  job  of  building  a  deterministic  program  to  a  compiler. 

In  Part  V,  we  will  take  a  different  look  at  analyzing  problems  and  the  programs  that 
solve  them.  There  we  will  be  concerned  with  the  complexity  of  the  solution:  How  much 
running  time  does  it  take  or  how  much  memory  does  it  require?  In  that  analysis,  non¬ 
determinism  will  play  another  important  role.  It  will  enable  us  to  separate  our  solution 
to  a  problem  into  two  parts: 

1.  The  complexity  of  an  individual  path  through  the  search  tree  that  choose  creates. 
Each  such  path  will  typically  correspond  to  checking  one  complete  guess  to  see  if 
it  is  a  solution  to  the  problem  we  are  trying  to  solve. 

2.  The  total  complexity  of  the  entire  search  process. 

So.  although  nondeterminism  may  at  First  seem  at  odds  with  our  notion  of  effective 
compulation,  we  will  find  throughout  this  book  that  it  is  a  very  useful  tool  in  helping  us 
to  analyze  problems  and  sec  how  they  fit  into  each  of  the  models  that  we  will  consider. 

For  some  problems,  it  is  useful  to  extend  choose  to  allow  probabilities  to  be  associ¬ 
ated  with  each  of  the  alternatives.  For  example,  we  might  write: 

choose  ((.5)  action  1;; 

(.3)  action  2;; 

(.2)  action  3) 
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For  some  applications,  the  semantics  \vc  will  want  for  this  extended  form  of 
choose  will  he  that  exactly  one  path  should  he  pursued.  Let  Pr(»)  he  the  probabili¬ 
ty  associated  with  alternative  /i.Then  choose  will  select  alternative  n  with  probabil¬ 
ity  Pr(/i).  For  other  applications,  we  will  want  a  different  semantics:  All  paths 
should  be  pursued  and  a  total  probability  should  be  associated  with  each  path  as  a 
function  of  the  set  of  probabilities  associated  with  each  step  along  the  path.  We  will 
have  more  to  say  about  how  these  probabilities  actually  work  when  we  talk  about 
specific  applications. 

4.3  Functions  on  Languages  and  Programs 

In  Chapter  2.  we  described  some  useful  functions  on  languages.  We  considered  sim¬ 
ple  functions  such  as  complement,  concatenation,  union,  intersection,  and  Kleene 
star.  All  of  those  were  defined  by  straightforward  extension  of  the  standard  opera¬ 
tions  on  sets  and  strings.  Functions  on  languages  are  not  limited  to  those,  however.  In 
this  section,  we  mention  a  couple  of  others,  which  we'll  come  back  to  at  various 
points  throughout  this  book. 


EXAMPLE  4.10  The  Function  chop 

Define  chop(L)  =  {w :  3x  e  L  (x  =jf|C0f;  Ax\  e  SL*  Ax^el^  Ace  -/.aU||  = 
| x2 1 Alt’  =  In  other  words,  clwp(L)  is  all  the  odd  length  strings  in  L  with 

their  middle  character  chopped  out. 

Recall  the  language  AnB"  =  {a"b"  :  n  ^  ()}.  What  ist7»o/»(AnB")?The  answer 
is  0,  since  there  are  no  odd  length  strings  in  AnBn. 

What  about  AnBnCn  =  {a'Wr/i  &  ()}?  What  is  clwp(AnB"Cn)>  Approxi¬ 
mately  half  of  the  strings  in  AnBnCn  have  odd  length  and  so  can  have  their  middle 
character  chopped  out.  Strings  in  AnB"Cn  contribute  strings  to  chop  ( A"B"Cn)  as 
follows: 


n 

in  A"B"C* 

in  chop  A"BnL’" 

0 

e 

1 

abc 

ac 

2 

aabbcc 

3 

aaabbbccc 

aaabbccc 

4 

aaaabbbbcccc 

5 

aaaaabbbbbccccc 

aaaaabbbbccccc 

So, c7»op(AnBnC")  =  {a2',*'b2V'H,:«  >  <)}. 
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EXAMPLE  4.11  The  Function  firstchars 

Define  firslcluirs(L)  =  { w :  3 I  v  e  L  (y  =  cx  A  c  e  2 l  A  x  e  2^*  A  w  e  c*) } .  So  we 
could  determine  firstchars(L)  by  looking  at  all  the  strings  in  L,  finding  all  the 
characters  that  start  such  strings,  and  then,  for  each  such  character  c,  adding  to 
firstchars(L)  all  the  strings  in  c*.  Let’s  look  at  firstchars  applied  to  some  languages: 


L 

firstchars(L) 

0 

0 

{«} 

0 

U) 

{a}* 

A"Bn 

{a}* 

{a.  b}* 

{a}*U  {b}* 

Given  some  function  /  on  languages,  we  may  want  to  ask  the  question,  ‘if  L  is  a 
member  of  some  language  class  C,  what  can  we  say  about  f{Lyi  Is  it  too  a  member 
of  C!  Alternatively,  is  the  class  C  closed  under  /?” 


EXAMPLE  4.12  Are  Language  Classes  Closed  Under  Various 
Functions? 

Consider  two  classes  of  languages,  INF  (the  set  of  infinite  languages)  and  FIN 
(the  set  of  finite  languages).  And  consider  four  of  the  functions  we  have  dis¬ 
cussed:  union,  intersection,  chop  and  firstchars.  We  will  ask  the  question,  “Is  class 
C  closed  under  function /?"The  answers  are  (with  the  number  in  each  cell  point¬ 
ing  to  an  explanation  below  for  the  corresponding  answer): 


FIN 

INF 

union 

yes(l) 

yes  (5) 

intersection 

yes  (2) 

no  (6) 

chop 

yes  (3) 

no  (7) 

firstchars 

no  (4) 

yes  (8) 

1.  For  any  sets  A  and  B,  \A  UB|  <  |A|  +  |B|. 

2.  For  any  sets  A  and  B,  \A  nfi|  <  min  (|A|,  |B|). 

3.  Each  string  in  L  can  generate  at  most  one  string  in  chop  (L),  so 
I  chop  (L)|  <  |L(. 
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EXAMPLE  4.12  ( Continued) 

4.  To  show  that  any  class  C  is  not  closed  under  some  function  /it  is  sufficient 
to  show  a  single  counter  example:  a  language  L  where  L  e  C  but  /(L)  e  C. 
We  showed  such  a  counter  example  above:  firstchars  ( { a } )  =  {a}*. 

5.  For  any  sets  A  and  B.  \A  Ufi|  s  |A|. 

6.  We  show  one  counterexample:  Let  L\  =  (a)*  and  L2  =  { b L\  and 
L2  are  infinite.  But  L{  H  L2  =  {e},  which  is  finite. 

7.  We  have  already  shown  a  counterexample:  AnB*  is  infinite.  But 
Chop  (AnBn)  *  0,  which  is  finite. 

8.  (f  L  is  infinite,  then  it  contains  at  least  one  string  of  length  greater  than  0. 
That  string  has  some  first  character  c.  Then  {c}*  Qfirstchars  (L)  and 
{c}*  is  infinite. 


In  the  rest  of  this  book,  we  will  discuss  the  four  classes  of  languages:  regular, 
context-free,  decidable,  and  semidecidable,  as  described  in  Chapter  3.  One  of  the  ques¬ 
tions  we  will  ask  for  each  of  them  is  whether  they  are  closed  under  various  operations. 

Given  some  function  / on  languages,  how  can  we: 

1.  Implement /? 

2.  Show  that  some  class  of  languages  is  closed  under /? 

The  answer  to  question  2  is  generally  by  construction.  In  other  words,  we  will  show 
an  algorithm  that  takes  a  description  of  the  input  languagc(s)  and  constructs  a  descrip¬ 
tion  of  the  result  of  applying /to  that  input.  We  will  then  use  that  constructed  descrip¬ 
tion  to  show  that  the  resulting  language  is  in  the  class  we  care  about.  So  our  ability  to 
answer  both  questions  1  and  2  hinges  on  our  ability  to  define  an  algorithm  that  com¬ 
putes  /,  given  a  description  of  its  input  (which  is  one  or  more  languages). 

In  order  to  define  an  algorithm  A  to  compute  some  function /.  we  first  need  a  way  to 
define  the  input  to  A.  Defining  A  is  going  to  be  very  difficult  if  we  allow,  for  example, 
English  descriptions  of  the  language(s)  on  which  A  is  supposed  to  operate.  W'hat  we 
need  is  a  formal  model  that  is  exactly  powerful  enough  to  describe  the  languages  on 
which  we  would  like  A  to  be  able  to  run. Then  A  could  use  the  description(s)  of  its  input 
language(s)  to  build  a  new  description,  using  the  same  model,  of  the  result  of  applying/. 


EXAMPLE  4.13  Representing  Languages  So  That  Functions 
Can  Be  Applied 

Suppose  that  we  wish  to  compute  the  function  union.  It  will  be  very  hard  to  imple¬ 
ment  union  if  we  allow  input  language  description  such  as: 

•  {u;e  {a,  b}*  :  w  has  an  odd  number  of  characters}. 
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•  {we  {a,b}*  :  whas  an  even  number  of  a’s}. 

•  {w  e  {a,  b}*  ;  all  a’s  in  w  precede  all  b’s}. 

Suppose,  on  the  other  hand,  that  we  describe  each  of  these  languages  as  a  finite 
state  machine  that  accepts  them.  So,  for  example,  language  1  would  be  repre¬ 
sented  as 


a,b 
a,b 

In  Chapter  8,  we  will  show  an  algorithm  that,  given  two  FSMs,  corresponding  to 
two  regular  languages,  Lj  and  L2,  constructs  a  new  FSM  that  accepts  the  union  of 
L|  and  JL2. 


If  we  use  finite  state  machines  (or  pushdown  automata  or  Turing  machines)  as 
input  /  to  an  algorithm  A  that  computes  some  function  /,  then  what  A  will  do  is  to  ma¬ 
nipulate  those  FSMs  (or  PDAs  or  Turing  machines)  and  produce  a  new  one  that  ac¬ 
cepts  the  language  /(/).  If  we  think  of  the  input  FSMs  (or  PDAs  or  Turing  machines)  as 
programs,  then  A  is  a  program  whose  input  and  output  are  other  programs. 


Lisp  is  a  programming  language  that  makes  it  easy  to  write  programs  that 
manipulate  programs.  (G.5) 


Programs  that  write  other  programs  are  not  particularly  common,  but  they  are 
not  fundamentally  different  from  programs  that  work  with  any  other  data  type. 
Programs  in  any  conventional  programming  language  can  be  expressed  as  strings, 
so  any  program  that  can  manipulate  strings  can  manipulate  programs.  Unfortu¬ 
nately,  the  syntax  of  most  programming  languages  makes  it  relatively  difficult  to 
design  programs  that  can  effectively  manipulate  other  programs.  As  we  will  see 
later,  the  FSM,  PDA.  and  Turing  machine  formalisms  that  we  are  going  to  focus  on 
are  reasonably  easy  to  work  with.  Programs  that  perform  functions  on  FSMs, 
PDAs,  and  Turing  machines  will  be  an  important  part  of  the  theory  that  we  are 
about  to  build. 


Programs  that  write  other  programs  play  an  important  role  in  some  applica¬ 
tion  areas,  including  mathematical  modeling  of  such  things  as  oil  wells  and  fi¬ 
nancial  markets.  (G.  8) 
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Exercises 

1.  Describe  in  dear  English  or  pseudocode  a  decision  procedure  to  answer  the 
question,  "Given  a  list  of  integers  N  and  an  individual  integer  /».  is  there  any  ele¬ 
ment  of  N  that  is  a  factor  of «?’" 

2.  Given  a  Java  program  p  and  the  input  U.  consider  the  question.  “Does  p  ever  out¬ 
put  anything?” 

a.  Describe  a  semidecision  procedure  that  answers  this  question. 

b.  Is  there  an  obvious  way  to  turn  your  answer  to  part  a  into  a  decision 
procedure? 

3.  Recall  the  function  chnp(L ),  defined  in  Example  4.10.  Let  /.  =  {<re  (a,  b}*: 
w  =  u’^J.What  ischop(L)? 

4.  Are  the  following  sets  closed  under  the  following  operations?  Prove  your  answer. 
If  a  set  is  not  closed  under  the  operation,  what  is  its  closure  under  the  operation? 

a.  L  =  {«!e  {a,  b}*  :  wends  in  a}  under  the  function  odds,  defined  on  strings 
as  follows:  odds(s)  =  the  string  that  is  formed  by  concatenating  together  all 
of  the  odd  numbered  characters  of  s.  (Start  numbering  the  characters  at  l.) 
For  example,  o*/</y(ababbbb)  =  aabb. 

b.  FIN  (the  set  of  finite  languages)  under  the  function  adds!.,  defined  on  lan¬ 
guages  as  follows: 

oddsL  (L)  =  {u.’ :  3*  e  L  (tv  =  odds  U))}. 

c.  INF  (the  set  of  infinite  languages)  under  the  function  odds!.. 

d.  FIN  under  the  function  maxstring ,  defined  in  Example  X.22. 

e.  INF  under  the  function  maxstring. 

5.  Let  X  =  (a,  b}.  Let  S  be  the  set  of  all  languages  over  Let  f  be  a  binary  func¬ 
tion  defined  as  follows: 

f:S  X  s->s. 

fix .  v)  -  x-  y. 

Answer  each  of  the  following  questions  and  justify  your  answer: 

a.  Is  /  one-to-one? 

b.  Is /onto? 

c.  Is/commutative? 

6.  Describe  a  program,  using  choose,  to: 

a.  Play  Sudoku  S3. described  in  N.2.2. 

b.  Solve  Rubik’s  Cube'*'  GJ. 


PART  II 


FINITE  STATE  MACHINES 
AND  REGULAR  LANGUAGES 


In  this  section,  we  begin  our  explo¬ 
ration  of  the  language  hierarchy. 
We  will  start  In  the  inner  circle, 
which  corresponds  to  the  class  of 
regular  languages. 

We  will  explore  three  techniques, 
which  we  will  prove  are  equivalent, 
for  defining  the  regular  languages: 

•  Finite  state  machines. 

•  Regular  languages. 

•  Regular  grammars. 


CHAPTER  5 

Finite  State  Machines 


The  simplest  and  most  efficient  computational  device  that  wc  will  consider  is  the 
finite  stale  machine  (or  FSM). 


EXAMPLE  5.1  A  Vending  Machine 

Consider  the  problem  of  deciding  when  to  dispense  a  drink  from  a  vending  ma¬ 
chine.  To  simplify  the  problem  a  bit,  we’ll  pretend  that  it  were  still  possible  to  buy 
a  drink  for  $.25  and  we  will  assume  that  vending  machines  do  not  take  pennies. 
The  solution  that  we  will  present  for  this  problem  can  straightforwardly  be  ex¬ 
tended  to  modern,  high-priced  machines. 

The  vending  machine  controller  will  receive  a  sequence  of  inputs,  each  of  which 
corresponds  to  one  of  the  following  events. 

•  A  coin  is  deposited  into  the  machine.  We  can  use  the  symbols  N  (for  nickel),  D 
(for  dime),  and  Q  (for  quarter)  to  represent  these  events. 

•  The  coin  return  button  is  pushed.  We  can  use  the  symbol  R  (for  return)  to  rep¬ 
resent  this  event. 

•  A  drink  button  is  pushed  and  a  drink  is  dispensed.  We  can  use  the  symbol  S 
(for  soda)  for  this  event. 

After  any  finite  sequence  of  inputs,  the  controller  will  be  in  cither: 

•  A  dispensing  state,  in  which  it  is  willing  to  dispense  a  drink  if  a  drink  button  is 
pushed. 

•  A  nondispensing  state,  in  which  not  enough  money  has  been  inserted  into  the 
machine. 
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While  there  is  no  bound  on  the  length  of  the  input  sequence  that  a  drink  ma¬ 
chine  may  see  in  a  week,  there  is  only  a  finite  amount  of  history  that  its  con¬ 
troller  must  remember  in  order  to  do  its  job.  It  needs  only  to  be  able  to  answer 
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the  question,  “Has  enough  money  been  inserted,  since  the  last  time  a  drink  was 
dispensed,  to  purchase  the  next  drink?”  It  is  of  course  possible  for  someone  to 
keep  inserting  money  without  ever  pushing  a  dispense -drink  button.  But  we  can 
design  a  controller  that  will  simply  reject  any  money  that  comes  in  after  the 
amount  required  to  buy  a  drink  has  been  recorded  and  before  a  drink  has  actu¬ 
ally  been  dispensed.  We  will  however  assume  that  our  goal  is  to  design  a  cus¬ 
tomer-friendly  drink  machine.  For  example,  the  thirsty  customer  may  have  only 
dimes.  So  we’ll  build  a  machine  that  will  accept  up  to  $.45.  If  more  than  the  nec¬ 
essary  $.25  is  inserted  before  a  dispensing  button  is  pushed,  our  machine  will  re¬ 
member  the  difference  and  leave  a  “credit’’  in  the  machine.  So.  for  example,  if  a 
customer  inserts  three  dimes  and  then  asks  for  drink,  the  machine  will  remember 
the  balance  of  $.05. 

Notice  that  the  drink  controller  does  not  need  to  remember  the  actual  se¬ 
quence  of  coins  that  it  has  received.  It  need  only  remember  the  total  value  of  the 
coins  that  have  been  inserted  since  the  last  drink  was  dispensed. 

The  drink  controller  that  we  have  just  described  needs  10  states,  corresponding 
to  the  possible  values  of  the  credit  that  the  customer  has  in  the  machine:  0, 5, 10, 
15, 20,25, 30, 35, 40,  and  45  cents.  The  main  structure  of  the  controller  is  then: 


The  state  that  is  labeled  S  is  the  start  state.  Transitions  from  one  state  to  the 
next  are  shown  as  arrows  and  labeled  with  the  event  that  causes  them  to  take 
place.  As  coins  are  deposited,  the  controller’s  state  changes  to  reflect  the  amount 
of  money  that  has  been  deposited.  When  the  drink  button  is  pushed  (indicated  as 
S  in  the  diagram)  and  the  customer  has  a  credit  of  less  than  $.25,  nothing  happens. 
The  machine's  state  does  not  change.  If  the  drink  button  is  pushed  and  the  cus¬ 
tomer  has  a  credit  of  $.25  or  more,  the  credit  is  decremented  by  $.25  and  a  drink  is 
dispensed.  The  drink-dispensing  states,  namely  those  that  correspond  to  “enough 
money”,  can  be  thought  of  as  goal  or  accepting  stales.  We  have  shown  them  in  the 
diagram  with  double  circles. 

Not  all  of  the  required  transitions  have  been  shown  in  the  diagram.  It  would  be 
too  difficult  to  read.  We  must  add  to  the  ones  shown  all  of  the  following: 

•  From  each  of  the  accepting  states,  a  transition  back  to  itself  labeled  with  each 
coin  value.  These  transitions  correspond  to  our  decision  to  reject  additional 
coins  once  the  machine  has  been  fed  the  price  of  a  drink. 
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EXAMPLE  5.1  (Continued) 

•  From  each  state,  a  transition  back  to  the  start  state  labeled  R. These  transitions 
will  be  taken  whenever  the  customer  pushes  the  coin  return  button. They  cor¬ 
respond  to  the  machine  returning  all  of  the  money  that  it  has  accumulated 
since  the  last  drink  was  dispensed. 


The  drink  controller  that  we  have  just  described  is  an  example  of  a  finite  state  ma¬ 
chine.  We  can  think  of  it  as  a  device  to  solve  a  problem  (dispense  drinks).  Or  we  can 
think  of  it  as  a  device  to  recognize  a  language  (the  “enough  money"  language  that  con¬ 
sists  of  the  set  of  strings,  such  as  NDD,  that  drive  the  machine  to  an  accepting  state  in 
which  a  drink  can  be  dispensed).  In  most  of  the  rest  of  this  chapter,  we  will  take  the  lan¬ 
guage  recognition  perspective.  But  it  does  also  make  sense  to  imagine  a  finite  state  ma¬ 
chine  that  actually  acts  in  the  world  (for  example,  by  outputting  a  coin  or  a  drink).  We 
will  return  to  that  idea  in  Section  5.9. 


The  history  of  finite  stale  machines  substantially  predates  modern 
computers.  (P.  1) 


5.1  Deterministic  Finite  State  Machines 

A  finite  state  machine  (or  FSM)  is  a  computational  device  whose  input  is  a  string  and 
whose  output  is  one  of  two  values  that  we  can  call  Accept  and  Reject.  FSMs  are  also 
sometimes  called  finite  stale  automata  or  FSAs. 

If  M  is  an  FSM,  an  input  string  is  fed  to  M  one  character  at  a  time,  left  to  right.  Each 
time  it  receives  a  character.  M  considers  its  current  state  and  the  new  character  and 
chooses  a  next  slate.  One  or  more  of  M' s  states  may  be  marked  as  accepting  states.  If  M 
runs  out  of  input  and  is  in  an  accepting  state,  it  accepts.  If.  however,  M  runs  out  of  input 
and  is  not  in  an  accepting  state,  it  rejects. The  number  of  steps  that  M  executes  on  input 
tv  is  exactly  equal  to  |w|.  so  M  always  halts  and  either  accepts  or  rejects. 

We  begin  by  defining  the  class  of  FSMs  whose  behavior  is  deterministic.  In  such  machines, 
there  is  always  exactly  one  move  that  can  be  made  at  each  step;  that  move  is  determined  by 
the  current  state  and  the  next  input  character.  In  Section  5.4.  we  will  relax  this  restriction  and 
introduce  nondeterministic  FSMs  (also  called  NDFSMs).  in  which  there  may.  at  various 
points  in  the  computation.be  more  than  one  move  from  which  the  machine  may  choose.  We 
will  continue  to  use  the  term  FSM  to  include  both  deterministic  and  nondeterministic  FSMs. 


A  telephone  switching  circuit  can  easily  be  modeled  as  a  DFSM. 

Formally,  a  deterministic  FSM  (or  DFSM)  M  is  a  quintuple  (K.2. 6,  s .  A ).  where; 

•  K  is  a  finite  set  of  states, 

•  2  is  the  input  alphabet. 
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•  s  e  K  is  the  start  state, 

•  A  Q  K  is  the  set  of  accepting  states,  and 

•  5  is  the  transition  function.  It  maps  from: 

K  X  2  to  K. 

state  input  symbol  state 

A  configuration  of  a  DFSM  M  is  an  element  of  K  X  S*.  Think  of  it  as  a  snapshot 
of  M.  It  captures  the  two  things  that  can  make  a  difference  to  M's  future  behavior: 

•  Its  current  state. 

•  The  input  that  is  still  left  to  read. 

The  initial  configuration  of  a  DFSM  A/,  on  input  w ,  is  (sM,  u>),  where  sM  is  the  start 
state  of  M.  (We  can  use  the  subscript  notation  to  refer  to  components  of  a  machine  M's 
definition,  although,  when  the  context  makes  it  clear  what  machine  we  are  talking  about, 
we  may  omit  the  subscript.) 

The  transition  function  5  defines  the  operation  of  a  DFSM  M  one  step  at  a  time.  We 
can  use  it  to  define  the  sequence  of  configurations  that  M  will  enter.  We  start  by  defin¬ 
ing  the  relation  yields-in-one-step.  written  |— ^.  Yields-in-one-step  relates  configuration  \ 
to  configuration  iff  M  can  move  from  configuration \  to  configurational  in  one  step.  Let 
c  be  any  element  of  2  and  lettu  be  any  element  of  X*.Then, 

(q\,cw)  \-M  ( ,  w)  iff  ((<7i,  c),  q2)  e  8. 

We  can  now  define  the  relation  yields ,  written  \~M *  to  be  the  reflexive,  transitive 
closure  of  |-w.  So  configuration  Ct  yields  configuration  C2  iff  M  can  go  from  Cj  to  C2 
in  zero  or  more  steps.  In  this  case,  we  will  write: 

Q  I~m*  Ci. 

A  computation  by  M  is  a  finite  sequence  of  configurations  Cy,  Cj, ...  ,Cn  for  some 
n  ^  0  such  that: 

•  Cy  is  an  initial  configuration, 

•  C„  is  of  the  form  ( q ,  e),  for  some  state  q  e  (i.e.,  the  entire  input  string  has  been 
read),  and 

•  Q  Cj  |-m  C2  |-w  C„. 

Let  w  be  an  element  of  1*.  Then  we  will  say  that: 

•  M  accepts  w  iff  ( s ,  w)  |-M*  (<y,  e),  for  some  q  e  AM.  Any  configuration  {q,  e),  for 
some  qeAy,  is  called  an  accepting  configuration  of  M. 

•  M  refects  w  iff  (s,  w)  |-M*  ( q ,  e),  for  some  q*AM.  Any  configuration  ( q ,  e),  for 
some  q  *  Am,  is  called  an  rejecting  configuration  of  M. 

M  halts  whenever  it  enters  either  an  accepting  or  a  rejecting  configuration.  It  will  do  so 
immediately  after  reading  the  last  character  of  its  input. 

The  language  accepted  by  M,  denoted  L(M),  is  the  set  of  all  strings  accepted  by  M. 
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EXAMPLE  5.2  A  Simple  Language  of  a's  and  b's 

Let  L  =  {u>e  {a,b}* :  every  a  is  immediately  followed  by  a  b}.  L  can  be  accept¬ 
ed  by  the  DFSM  M  =  ({<7o,r/i.<72h  {a,b}.$,<7„.  where: 

S  =  {((<7o,a),<7,), 

((<7o-  b).  r/0). 

((9i.  a),  ft), 
b).  <?o), 
a).  <72)* 

((</2-  b).<72))}- 

The  tuple  notation  that  we  have  just  used  for  5  is  quite  hard  to  read.  We  will 
generally  find  it  useful  to  draw  8  as  a  transition  diagram  instead.  When  we  do  that, 
we  will  use  two  conventions: 

1.  The  start  state  will  be  indicated  with  an  unlabclcd  arrow  pointing  into  it. 

2.  The  accepting  states  will  be  indicated  with  double  circles. 

With  those  conventions,  a  DFSM  can  be  completely  specified  by  a  transition 
diagram.  So  M  is: 


We  will  use  the  notation  a,  b  as  a  shorthand  for  two  transitions,  one  labeled  a 
and  one  labeled  b. 

As  an  example  of  M’s  operation,  consider  the  input  string  abbabab.  M’s 
computation  is  the  sequence  of  configurations:  (</w,  abbabab).  (<7|,  bbabab), 
(<7o,  babab),  (q(h  abab),  (glt  bab),  (<y0,  ab),  (<?,,  b).  (r/y,  e).  Since  <y()  is  an  accepting 
state,  M  accepts. 


If  we  look  at  the  three  states  in  M,  the  machine  that  we  just  built,  we  sec  that  they 
are  of  three  different  sorts: 

1.  State  <7(»  is  an  accepting  stale.  Every  string  that  drives  M  to  stale  </„  is  in  L. 

2.  State  q\  is  not  an  accepting  state.  But  every  siring  that  drives  M  to  state  could 
turn  out  to  be  in  L  if  it  is  followed  by  an  appropriate  continuation  string,  in  this 
case,  one  that  starts  with  a  b. 
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3.  State  <72  is  what  we  will  call  a  dead  state.  Once  M  enters  state  q2,  it  will  never 
leave.  State  q2  is  not  an  accepting  state,  so  any  string  that  drives  M  to  state  q2  has 
already  been  determined  not  to  be  in  L,no  matter  what  comes  next.  We  will  often 
name  our  dead  states  d. 


EXAMPLE  5.3  Even  Length  Regions  of  a's 

Let  L  =  {to e  {a,  b}*  :  every  a  region  in  to  is  of  even  length}.  L  can  be  accepted 
by  the  DFSM  M: 

_ a 

a 


If  M  sees  a  b  in  state  q\,  then  there  has  been  an  a  region  whose  length  is  odd. 
So,  no  matter  what  happens  next,  M  must  reject.  So  it  goes  to  the  dead  state  d. 


A  useful  way  to  prototype  a  complex  system  is  as  a  finite  state  machine.  See 
P.  4  for  one  example:  the  controller  for  a  soccer-playing  robot. 


Because  objects  of  other  data  types  are  encoded  in  computer  memories  as  binary 
strings,  it  is  important  to  be  able  to  check  key  properties  of  such  strings. 


EXAMPLE  5.4  Checking  for  Odd  Parity 

Let  L  =  {we  {0, 1}*  :  w  has  odd  parity}.  A  binary  string  has  odd  parity  iff  the 
number  of  l’s  in  it  is  odd.  So  L  can  be  accepted  by  the  DFSM  M : 


One  of  the  most  important  properties  of  finite  state  machines  is  that  they  are  guar¬ 
anteed  to  halt  on  any  input  string  of  finite  length.  While  this  may  seem  obvious,  it  is 
worth  noting  since,  as  we'll  see  later,  more  powerful  computational  models  may  not 
share  this  property. 
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THEOREM  5.1  DFSMs  Halt  _ _ 

Theorem:  Every  DFSM  M.  on  input  w,  halts  after  |  ir|  steps. 

Proof:  On  input  w,M  executes  some  computation  Cj,  |-»f  C(  |-*f  C\  |~v  ...  I~w 
C„ ,  where  C(,  is  an  initial  configuration  and  C„  is  of  the  form  (r/.  r).  for  some  state 
q  e  Ks1.  C„  is  either  an  accepting  or  a  rejecting  configuration,  so  A/  will  halt  when 
it  reaches  C„.  Each  step  in  the  compulation  consumes  one  character  of  w.  So 
n  =  |w|.Thus  M  will  halt  after  |w|  steps. 


5.2  The  Regular  Languages 

We  have  now  built  DFSMs  to  accept  four  languages: 

•  “enough  money  to  buy  a  drink”, 

•  {we  {a, b}*  :  every  a  is  immediately  followed  by  a  bj-, 

•  {we{a,  b}*:  every  a  region  in  w  is  of  even  length },  and 

•  binary  strings  with  odd  parity. 

These  four  languages  are  typical  of  a  large  class  of  languages  that  can  be  accepted  by 
finite  state  machines. 

We  define  the  set  of  regular  languages  to  be  exactly  those  that  can  be  accepted  by 
some  DFSM. 


EXAMPLE  5.5  No  More  Than  One  b 

Let  £  =  {we  {a,  b}*  :  w  contains  no  more  than  one  b}.  L  is  regular  because  it 
can  be  accepted  by  the  DFSM  M: 


Any  string  with  more  than  one  b  will  drive  M  to  the  dead  state  1 1 .  All  other 
strings  will  drive  M  to  either  qti  or  qx ,  both  of  which  are  accepting  states. 


EXAMPLE  5.6  No  Two  Consecutive  Characters  Are  the  Same 

Let  L  =  {we  {a.b}* :  no  two  consecutive  characters  are  the  same}.  L  is  regular 
because  it  can  be  accepted  by  the  DFSM  M: 
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The  start  state,  qn,  is  the  only  state  in  which  both  a  and  b  are  legal  inputs.  M  will  be 
in  stale  q\  whenever  the  consecutive  characters  rule  has  not  been  violated  and  the 
last  character  it  has  read  was  a.  At  that  point,  the  only  legal  next  character  is  b.  M  will 
be  in  state  q2  whenever  the  consecutive  characters  rule  has  not  been  violated  and  the 
last  character  it  has  read  was  b.  At  that  point,  the  only  legal  next  character  is  a.  Any 
other  inputs  drive  M  to  d. 


Simple  languages  of  a's  and  b's,  like  the  ones  in  the  last  two  examples,  are  useful 
for  practice  in  designing  DFSMs.  But  the  real  power  of  the  DFSM  model  comes  from 
the  fact  that  the  languages  that  arise  in  many  real-world  applications  are  regular. 


The  language  of  universal  resource  identifiers  (URIs),  used  to  describe 
objects  on  the  World  Wide  Web,  is  regular.  (1.3.1) 


To  describe  less  trivial  languages  will  sometimes  require  DFSMs  that  are  hard  to 
draw  if  we  include  the  dead  state.  In  those  cases,  we  will  omit  it  from  our  diagrams. This 
doesn’t  mean  that  it  doesn’t  exist.  8  is  a  function  that  must  be  defined  for  all  (state, 
input)  pairs.  It  just  means  that  we  won't  bother  to  draw  the  dead  state.  Instead,  our 
convention  will  be  that  if  there  is  no  transition  specified  for  some  (state,  input)  pair, 
then  that  pair  drives  the  machine  to  a  dead  state. 


EXAMPLE  5.7  Floating  Point  Numbers 

Let  FLOAT  =  {w  :  w  is  the  string  representation  of  a  floating  point  number}. 
Assume  the  following  syntax  for  floating  point  numbers: 

•  A  floating  point  number  is  an  optional  sign,  followed  by  a  decimal  number, 
followed  by  an  optional  exponent. 

•  A  decimal  number  may  be  of  the  form  x  or  x.y,  where  x  and  y  are  nonempty 
strings  uf  decimal  digits. 
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EXAMPLE  5.7  (Continued) 

•  An  exponent  begins  with  E  and  is  followed  by  an  optional  sign  and  then  an 
integer. 

•  An  integer  is  a  nonempty  string  of  decimal  digits. 

So,  for  example,  these  strings  represent  floating  point  numbers: 

+3.0,  3.0, 0.3E1, 0.3E+1,  -0.3E+1.  -3E8 
FLOAT  is  regular  because  it  can  be  accepted  by  the  DFSM: 


In  this  diagram,  we  have  used  the  shorthand  d  to  stand  for  any  one  of  the  deci¬ 
mal  digits  (0  -  9).  And  we  have  omitted  the  dead  state  to  avoid  arrows  crossing 
over  each  other. 


EXAMPLE  5.8  A  Simple  Communication  Protocol 

Let  L  be  a  language  that  contains  all  the  legal  sequences  of  messages  that  can  be 
exchanged  between  a  client  and  a  server  using  a  simple  communication  protocol. 
We  will  actually  consider  only  a  very  simplified  version  of  such  a  protocol,  but  the 
idea  can  be  extended  to  a  more  realistic  model. 

Let  Xj.  =  {Open,  Request,  Reply,  Close}.  Every  string  in  L  begins  with  Open 
and  ends  with  Close.  In  addition,  every  Request,  except  possibly  the  last,  must  be 
followed  by  Reply  and  no  unsolicited  Reply’s  may  occur. 

L  is  regular  because  it  can  be  accepted  by  the  DFSM: 


Reply 


Note  that  we  have  again  omitted  the  dead  state. 
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More  realistic  communication  protocols  can  also  be  modeled  as  FSMs.  (1.1) 


5.3  Designing  Deterministic  Finite  State  Machines 

Given  some  language  L ,  how  should  we  go  about  designing  a  DFSM  to  accept  L?  In 
general,  as  in  any  design  task,  there  is  no  magic  bullet.  But  there  are  two  related  things 
that  it  is  helpful  to  think  about: 

•  Imagine  any  DFSM  M  that  accepts  L.  As  a  string  w  is  being  read  by  M,  what  prop¬ 
erties  of  the  part  of  w  that  has  been  seen  so  far  are  going  to  have  any  bearing  on  the 
ultimate  answer  that  M  needs  to  produce?  Those  are  the  properties  that  M  needs  to 
record.  So,  for  example,  in  the  "enough  money"  machine,  all  that  matters  is  the 
amount  of  money  since  the  last  drink  was  dispensed.  Which  coins  came  in  and  the 
order  in  which  they  were  deposited  make  no  difference. 

•  If  L  is  infinite  but  M  has  a  finite  number  of  states,  strings  must  “cluster”.  In 
other  words,  multiple  different  strings  will  all  drive  M  to  the  same  state.  Once 
they  have  done  that,  none  of  their  differences  matter  anymore.  If  they’ve  driven 
M  to  the  same  state,  they  share  a  fate.  No  matter  what  comes  next,  either  all  of 
them  cause  M  to  accept  or  all  of  them  cause  M  to  reject.  In  Section  5.7  we  will 
show  that  the  smallest  DFSM  for  any  language  L  is  the  one  that  has  exactly  one 
slate  for  every  group  of  initial  substrings  that  share  a  common  fate.  For  now, 
however,  it  helps  to  think  about  what  those  clusters  are.  We'll  do  that  in  our 
next  example. 


A  building  security  system  can  be  described  as  a  DFSM  that  sounds  an  alarm 
if  given  an  input  sequence  that  signals  an  intruder.  (J.1) 


EXAMPLE  5.9  Even  a's,  Odd  b's 

Let  L  =  {we  {a,  b}* :  w  contains  an  even  number  of  a’s  and  an  odd  number 
of  b’s) .  To  design  a  DFSM  M  to  accept  L,  we  need  to  decide  what  history  mat¬ 
ters.  Since  Af’s  goal  is  to  separate  strings  with  even  a’s  and  odd  b’s  from  strings 
that  fail  to  meet  at  least  one  of  those  requirements,  all  it  needs  to  remember  is 
whether  the  count  of  a’s  so  far  is  even  or  odd  and  whether  the  count  of  b’s  is 
even  or  odd.  So,  since  there  are  two  clusters  based  on  the  number  of  a’s  so  far 
(even  and  odd)  and  two  clusters  based  on  the  number  of  b’s,  there  are  four 
distinct  clusters. 

That  suggests  that  we  need  a  four-state  DFSM.  Often  it  helps  to  name  the 
states  with  a  description  of  the  clusters  to  which  they  correspond.  The  following 
DFSM  M  accepts  L : 
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EXAMPLE  5.9  (Continued) 


Notice  that,  once  we  have  designed  a  machine  that  analyzes  an  input  string  with 
respect  to  some  set  of  properties  we  care  about,  it  is  relatively  easy  to  build  a  dif¬ 
ferent  machine  that  accepts  strings  based  on  different  values  of  those  properties. 
For  example,  to  change  M  so  that  it  accepts  exactly  the  strings  with  both  even  a’s 
and  even  b\  all  we  need  to  do  is  to  change  the  accepting  state. 


EXAMPLE  5.10  All  the  Vowels  in  Alphabetical  Order 

LetL  =  {n»e  {a  -  z }*:  all  five  vowels. a,  e,i,o. and  u. occur  in  w  in  alphabetical 
order}.  So  L  contains  words  like  abstemious,  facetious,  and  sacrilegious. 
But  it  does  not  contain  tenacious,  which  dues  contain  all  the  vowels,  but  not  in 
the  correct  order.  It  is  hard  to  write  a  clear,  elegant  program  to  accept  L.  But  de¬ 
signing  a  DFSM  is  simple.  The  following  machine  Af  does  the  job.  In  this  descrip¬ 
tion  of  Af,  let  the  label  "2  -  {a}"  mean  "all  elements  of  £  except  a"  and  let  the 
label  “2”  mean  “all  elements  of 


1  -{a}  v-{e}  2-{i}  S-{o}  S-U) 


Notice  that  the  state  that  we  have  labeled  yes  functions  exactly  opposite  to  the 
way  in  which  the  dead  stale  works.  If  M  ever  reaches  yes.  it  has  decided  to  accept 
no  matter  what  comes  next. 


Sometimes  an  easy  way  to  design  an  FSM  to  accept  a  language  /.  is  to  begin  by  de¬ 
signing  an  FSM  to  accept  the  complement  of  /..Then,  as  a  final  step,  we  swap  the  ac¬ 
cepting  and  the  nonaccepling  states. 
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EXAMPLE  5.11  A  Substring  that  Doesn't  Occur 

Let  L  =  {ioe{a,  b}*  :  w  does  not  contain  the  substring  aab}.  It  is  straightfor¬ 
ward  to  design  an  FSM  that  looks  for  the  substring  aab.  So  we  can  begin  building 
a  machine  to  accept  L  by  building  the  following  machine  to  accept  -iL: 


Then  we  can  convert  this  machine  into  one  that  accepts  L  by  making  states  qQ,  <?i, 
and  q2  accepting  and  state  nonaccepting. 


In  Section  8.3  we’ll  show  that  the  regular  languages  are  closed  under  complement 
(i.e.,  the  complement  of  every  regular  language  is  also  regular).  The  proof  will  be  by 
construction  and  the  last  step  of  the  construction  will  be  to  swap  accepting  and  nonac¬ 
cepting  stales,  just  as  we  did  in  the  last  example. 

Sometimes  the  usefulness  of  the  DFSM  model,  as  we  have  so  far  defined  it, breaks  down 
before  its  formal  power  does.  There  are  some  regular  languages  that  seem  quite  simple 
when  we  state  them  but  that  can  only  be  accepted  by  DFSMs  of  substantial  complexity. 


EXAMPLE  5.12  The  Missing  Letter  Language 

Let  2  =  { a ,  b ,  c ,  d).  Let  LMi„u,g  =  { w  :  there  is  a  symbol  a-,  e  2  not  appearing 
in  v >}.  L m faring  is  regular.  We  can  begin  writing  out  a  DFSM  Af  to  accept  it.  We  will 
need  the  following  states: 

•  The  start  state:  all  letters  are  still  missing. 

After  one  character  has  been  read,  M  could  be  in  any  one  of: 

•  a  read,  so  b,  c,  and  d  still  missing. 

•  b  read,  so  a,  c,  and  d  still  missing. 

•  c  read,  so  a,  b.  and  d  still  missing. 

•  d  read,  so  a,  b,  and  c  still  missing. 

After  a  second  character  has  been  read,  M  could  be  in  any  of  the  previous 
states  or  one  of: 

•  a  and  b  read,  so  c  and  d  still  missing. 

•  a  and  c  read,  so  b  and  d  still  missing. 

•  and  so  forth .  There  are  six  of  these. 
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EXAMPLE  5.12  (Continued) 

After  a  third  character  has  been  read.  M  could  be  in  any  of  the  previous  states 
or  one  of: 

•  a  and  b  and  c  read,  so  d  missing. 

•  a  and  b  and  d  read,  so  c  missing. 

•  a  and  c  and  d  read,  so  b  missing. 

•  b  and  c  and  d  read,  so  a  missing. 

After  a  fourth  character  has  been  read,  M  could  be  in  any  of  the  previous 
states  or: 

•  All  characters  read,  so  nothing  is  missing. 

Every  state  except  the  last  is  an  accepting  state.  M  is  complicated  hut  it  would  be 
possible  to  write  it  out.  Now  imagine  that  21  were  the  entire  English  alphabet.  It 
would  still  be  possible  to  write  out  a  DFSM  to  accept  LMhvnf!,  but  it  would  be  so  com¬ 
plicated  it  would  be  hard  to  get  it  right. The  DFSM  model  is  no  longer  very  useful. 


5.4  Nondeterministic  FSMs 

To  solve  the  problem  that  we  just  encountered  in  the  missing  letter  example,  we  will 
modify  our  definition  of  an  FSM  to  allow  nondeterminism.  Recall  our  discussion  of 
nondeterminism  in  Section  4.2.  We  will  now  introduce  our  first  specific  use  of  the  ideas 
we  discussed  there.  We'll  see  that  we  can  easily  build  a  nondeterministic  FSM  M  to  ac¬ 
cept  Luftaiw  Any  siring  in  7.^, must  be  missing  at  least  one  letter.  We  ll  design  M  so 
that  it  simply  guesses  at  which  letter  that  is.  It  there  is  a  missing  letter,  then  at  least  one 
of  M's  guesses  will  be  right  and  the  corresponding  path  will  accept.  So  M  will  accept. 

5.4.1  What  Is  a  Nondeterministic  FSM? 

A  nondeterministic  FSM  (or  NDFSM)  A/is  a  quintuple  (K.  1,  A.  s,  A ).  where: 

•  K  is  a  finite  set  of  states, 

•  21  is  an  alphabet, 

•  5  e  K  is  the  start  state, 

•  A  C  is  the  set  of  final  states,  and 

•  A  is  the  transition  relation.  It  is  a  finite  subset  of:  ( AT  x  (2HJ{r}))  x  K. 

In  other  words,  each  element  of  A  contains  a  (slate,  input  symbol  or  e)  pair,  and  a 
new  state. 

We  define  configuration,  initial  configuration,  accepting  configuration. yiclds-in-nne- 
step,  yields,  and  computation  analogously  to  the  way  that  we  defined  them  for  DFSMs. 
Let  tv  be  an  element  of  21*.  Then  we  will  say  that: 

•  M  accepts  w  iff  at  least  one  of  its  computations  accepts. 

•  M  rejects  w  iff  none  of  its  computations  accepts. 
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The  language  accepted  by  M,  denoted  L(M),  is  the  set  of  all  strings  accepted  by  M. 

There  are  two  key  differences  between  DFSMs  and  NDFSMs.  In  every  configuration, 
a  DFSM  can  make  exactly  one  move.  However,  because  A  can  be  an  arbitrary  relation 
(that  may  not  also  be  a  function),  that  is  not  necessarily  true  for  an  NDFSM.  Instead: 

•  An  NDFSM  M  may  enter  a  configuration  in  which  there  are  still  input  symbols  left 
to  read  but  from  which  no  moves  are  available.  Since  any  sequence  of  moves  that 
leads  to  such  a  configuration  cannot  ever  reach  an  accepting  configuration,  M  will 
simply  halt  without  accepting. This  situation  is  possible  because  A  is  not  a  function. 
So  there  can  be  (state,  input)  pairs  for  which  no  next  state  is  defined. 

•  An  NDFSM  M  may  enter  a  configuration  from  which  two  or  more  competing 
moves  are  possible.  The  competition  can  come  from  either  or  both  of  the  following 
properties  of  the  transition  relation  of  an  NDFSM: 

-  An  NDFSM  M  may  have  one  or  more  transitions  that  are  labeled  e,  rather  than 
being  labeled  with  a  character  from  2.  An  e-transition  out  of  state  q  may  (but 
need  not)  be  followed,  without  consuming  any  input,  whenever  M  is  in  state  q. 
So  an  e-transition  from  a  state  q  competes  with  all  other  transitions  out  of  q. 
One  way  to  think  about  the  usefulness  of  e-transitions  is  that  they  enable  M  to 
guess  at  the  correct  path  before  it  actually  sees  the  input.  Wrong  guesses  will 
generate  paths  that  will  fail  but  that  can  be  ignored. 

—  Out  of  some  state  q ,  there  may  be  more  than  one  transition  with  a  given  label. 
These  competing  transitions  give  M  another  way  to  guess  at  a  correct  path. 

Consider  the  fragment,  shown  in  Figure  5.1,  of  an  NDFSM  M.  If  M  is  in  state  qQ  and 
the  next  input  character  is  an  a,  then  there  are  three  moves  that  M  could  make: 

1.  It  can  take  the  e-transition  to  qx  before  it  reads  the  next  input  character, 

2.  It  can  read  the  next  input  character  and  take  the  transition  to  q2 ,  or 

3.  It  can  read  the  next  input  character  and  take  the  transition  to  q 3. 

One  way  to  envision  the  operation  of  M  is  as  a  tree,  as  shown  in  Figure  5.2.  Each 
node  in  the  tree  corresponds  to  a  configuration  of  M.  Each  path  from  the  root  corre¬ 
sponds  to  a  sequence  of  moves  that  M  might  make.  Each  path  that  leads  to  a  configura¬ 
tion  in  which  the  entire  input  string  has  been  read  corresponds  to  a  computation  of  M. 

An  alternative  is  to  imagine  following  all  paths  through  M  in  parallel.  Think  of  M 
as  being  in  a  set  of  states  at  each  step  of  its  computation.  If,  when  M  runs  out  of  input, 
the  set  of  states  that  it  is  in  contains  at  least  one  accepting  state,  then  M  will  accept. 


FIGURE  5.1  An  NDFSM  with  two  kinds  of 
nondeterminism. 


a 


<7? 
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s.abab 


FIGURE  5.2  Viewing  nondeterminism  as  search  through  a  space  of  computation  paths. 


EXAMPLE  5.13  An  Optional  Initial  a 

Let  L  =  {we  {a,b}* :  w  is  made  up  of  an  optional  a  followed  by  aa  followed  by 
zero  or  more  b's}.  The  following  NDFSM  M  accepts  L : 

M  may  (but  is  not  required  to)  follow  the  e-transition  from  state  q0  to  state  q\ 
before  it  reads  the  first  input  character.  In  effect,  it  must  guess  whether  or  not  the 
optional  a  is  present. 


EXAMPLE  5.14  Two  Different  Sublanguages 

LetL  =  {we  {a,  b}*:w  =  aba  or  |w|  is  even}.  An  easy  way  to  build  an  FSM  to 
accept  this  language  is  to  build  FSMs  for  each  of  the  individual  sublanguages  and 
then  “glue"  them  together  with  e-transitions.  In  essence,  the  machine  guesses, 
when  processing  a  string,  which  sublanguage  the  string  might  be  in.  So  we  have: 


The  upper  machine  accepts  {we  {a,  b}* :  w  =  aba}.  The  lower  one  accepts 
{we  {a,b}*  :  |w|  is  even}. 


5.4  Nondeterministic  FSMs  69 


By  exploiting  nondeteiniinism,  it  may  be  possible  to  build  a  simple  FSM  to  accept  a  lan¬ 
guage  for  which  the  smallest  deterministic  FSM  is  complex.  A  good  example  of  a  language 
for  which  this  is  true  is  the  missing  letter  language  that  we  considered  in  Example  5.12. 


EXAMPLE  5.15  The  Missing  Letter  Language,  Again 

Let  1  =  {a,  b,  c,  d}.  LMissing  =  {u; :  there  is  a  symbol  a,  e  2  not  appearing  in 
w}.The  following  simple  NDFSM  M  accepts  LMissing: 


M  works  by  guessing  which  letter  is  going  to  be  the  missing  one.  If  any  of  its 
guesses  is  right,  it  will  accept.  If  all  of  them  are  wrong,  then  all  paths  will  fail  and  Af 
will  reject. 


5.4.2  NDFSMs  for  Pattern  and  Substring  Matching 

Nondeterministic  FSMs  are  a  particularly  effective  way  to  define  simple  machines  to 
search  a  text  string  for  one  or  more  patterns  or  substrings. 


EXAMPLE  5.16  Exploiting  Nondeterminism  for  Keyword  Matching 

Let  L  =  {we  {a,  b,  c}* :  3x,  y  e  {a,b,  c}*  {w  —  x  abcabb  y)}.  In  other  words, tu 
must  contain  at  least  one  occurrence  of  the  substring  abcabb.  The  following 
DFSM  AfL  accepts  L: 
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EXAMPLE  5.16  ( Continued ) 


While  Af,  works,  and  it  works  efficiently,  designing  machines  like  M \  and  getting 
them  right  is  hard.  The  spaghetti-like  transitions  are  necessary  because,  whenever 
a  match  fails,  it  is  possible  that  another  partial  match  has  already  been  found. 


But  now  consider  the  following  NDFSM  M2,  which  also  accepts  L: 


The  idea  here  is  that,  whenever  M2  sees  an  a.  it  may  guess  that  it  is  at  the  begin¬ 
ning  of  the  pattern  abcabb.  Or.  on  any  input  character  (including  a),  it  may  guess 
that  it  is  not  yet  at  the  beginning  of  the  pattern  (so  it  slays  in  //„).  If  it  ever  reach¬ 
es  <7$,  it  will  stay  there  until  it  has  finished  reading  the  input.  Then  it  will  accept. 


Of  course,  practical  string  search  engines  need  to  be  small  and  deterministic.  But 
NDFSMs  like  the  one  we  just  built  can  be  used  as  the  basis  for  constructing  such 
efficient  search  machines.  In  Section  5.4.4,  we  will  describe  an  algorithm  that  con¬ 
verts  an  arbitrary  NDFSM  into  an  equivalent  DFSM.  It  is  likely  that  that  machine 
will  have  more  states  than  it  needs.  But.  in  Section  5.7.  we  will  present  an  algorithm 
that  takes  an  arbitrary  DFSM  and  produces  an  equivalent  minimal  one  (i.e..  one 
with  the  smallest  number  of  stales).  So  one  effective  way  to  build  a  correct  and 
efficient  string-searching  machine  is  to  build  a  simple  NDFSM,  convert  it  to  an 
equivalent  DFSM,  and  then  minimize  the  result.  One  alternative  to  this  three-step 
process  is  the  Knuth-Morris-Prall  siring  search  algorithm,  which  we  will  present  in 
Example  27.5. 


String  searching  is  a  fundamental  operation  in  every  word  processing  or  text 
editing  system. 


Now  suppose  that  we  have  not  one  pattern  but  several.  Hand  crafting  a  DFSM  may  be 
even  more  difficult.  One  alternative  is  to  use  a  specialized,  keyword-search  FSM-build- 
ing  algorithm  that  we  will  present  in  Section  6.2.4.  Another  is  to  build  a  simple 
NDFSM.  as  we  show  in  the  next  example. 
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EXAMPLE  5.17  Multiple  Keywords 

Let  L  =  {we  {a,b}*:  Ir.ye  {a,b}  *  ((tc  =  x abbaa  y)  V  (w  =  jcbaba  y))}.In 
other  words,  w  contains  at  least  one  occurrence  of  the  substring  abbaa  or  the  sub¬ 
string  baba.  The  following  NDFSM  M  accepts  L : 


The  idea  here  is  that,  whenever  M  sees  an  a,  it  may  guess  that  it  is  at  the  beginning 
of  the  substring  abbaa.  Whenever  it  sees  a  b,  it  may  guess  that  it  is  at  the  begin¬ 
ning  of  the  substring  baba.  Alternatively,  on  either  a  or  b,  it  may  guess  that  it  is 
not  yet  at  the  beginning  of  either  substring  (so  it  stays  in  q0). 


NDFSMs  are  also  a  natural  way  to  search  for  other  kinds  of  patterns,  as  we  can  see 
in  the  next  example. 


EXAMPLE  5.18  Other  Kinds  of  Patterns 

Let  L  =  {we  {a,  b}* :  the  fourth  from  the  last  character  is  a}.  The  following 
NDFSM  M  accepts  L : 


The  idea  here  is  that,  whenever  it  sees  an  a,  one  of  M's  paths  guesses  that  it  is  the 
fourth  from  the  last  character  (and  so  proceeds  along  the  path  that  will  read  the 
last  three  remaining  characters).  The  other  path  guesses  that  it  is  not  (and  so  stays 
in  the  start  state). 


It  is  enlightening  to  try  designing  DFSMs  for  the  last  two  examples.  We  leave  that  as 
an  exercise.  If  you  try  it.  you’ll  appreciate  the  value  of  the  NDFSM  model  as  a  high- 
level  tool  for  describing  complex  systems. 
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5.4.3  Analyzing  Nondeterministic  FSMs 

Given  an  NDFSM  M,  such  as  any  of  the  ones  we  have  just  considered,  how  can  we  an¬ 
alyze  it  to  determine  what  strings  it  accepts?  One  way  is  to  do  a  depth-first  search  of 
the  paths  through  the  machine.  Another  is  to  imagine  tracing  the  execution  of  the  orig¬ 
inal  NDFSM  M  by  following  all  paths  in  parallel. To  do  that,  think  of  M  as  being  in  a  set 
of  states  at  each  step  of  its  computation.  For  example,  consider  again  the  NDFSM  that 
we  built  for  Example  5.17.  You  may  find  it  useful  to  trace  the  process  we  are  about  to 
describe  by  using  several  fingers.  Or,  when  fingers  run  out.  use  a  coin  on  each  active 
state.  Initially,  M  is  in  q0 .  If  it  sees  an  a.  it  can  loop  to  state  qlt  or  go  to  q\.  So  we  will 
think  of  it  as  being  in  the  set  of  states  [qt).  q\ }  (thus  we  need  two  fingers  or  two  coins). 
Suppose  it  sees  a  b  next.  From  qth  it  can  go  to  q{)  or  qb.  From  qx,  it  can  go  to  qz.  So,  after 
seeing  the  string  ab,  M  is  in  {qih  <y2,  qb)  (three  fingers  or  three  coins).  Suppose  it  sees  a 
b  next.  From  q0,  it  can  go  to  qn  or  qb.  From  q2.  it  can  go  to  </3.  From  qb,  it  can  go 
nowhere.  So,  after  seeing  abb,  M  is  in  {q(t.  q3.  qb).  And  so  forth.  If.  when  all  the  input 
has  been  read,  M  is  in  at  least  one  accepting  state  (in  this  case,  r/*  or  r/«>).  then  it  accepts. 
Otherwise  it  rejects. 

Handling  £-Transitions 

But  how  shall  we  handle  e-transitions?  The  construction  that  we  just  sketched  assumes 
that  all  paths  have  read  the  same  number  of  input  symbols.  But  if.  from  some  state 
one  transition  is  labeled  e  and  another  is  labeled  with  some  clement  of  2.  M  consumes 
no  input  as  it  takes  the  first  transition  and  one  input  symbol  as  it  lakes  the  second  tran¬ 
sition.  To  solve  this  problem,  we  introduce  the  function  epr.  K m  —*  -'P  (K\t ).  We  define 
eps(q),  where  q  is  some  state  in  M ,  to  be  the  set  of  states  of  M  that  are  reachable  from 
q  by  following  zero  or  more  e-transitions.  Formally: 

eps(q)  =  {peK:  (q.  w)  |-Af*  (p,u>)}. 

Alternatively, eps(q)  is  the  closure  of  {q}  under  the  relation  {(/>,  r) :  there  is  a  tran¬ 
sition  (p,  e.  r)  e  A  }.  The  following  algorithm  computes  eps\ 

eps(q:  state)  = 

1.  result  =  {q}- 

2.  While  there  exists  some  p  e  result  and  some  r  e  result  and  some  transition 
(p,  e,  r)  e  A  do:  Insert  r  into  result. 

3.  Return  result. 

This  algorithm  is  guaranteed  to  halt  because,  each  time  through  the  loop,  it  adds  an 
element  to  result.  It  must  halt  when  there  arc  no  elements  left  to  add.  Since  there  is 
only  a  finite  number  of  candidate  elements,  namely  the  finite  set  of  states  in  M.  and  no 
element  can  be  added  more  than  once,  the  algorithm  must  eventually  run  out  of  ele¬ 
ments  to  add,  at  which  point  it  must  halt.  It  correctly  computes  eps(q)  because,  by  the 
condition  associated  with  the  while  loop: 

•  It  can  add  no  clement  that  is  not  reachable  from  q  following  only  e-transitions. 

•  It  will  add  all  elements  that  are  reachable  from  q  following  only  e-transitions. 
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EXAMPLE  5.19  Computing  eps 

Consider  the  following  NDFSM  Af: 


To  compute  eps(q0),  we  initially  set  result  to  {q0} .Then  qx  is  added, producing 
{q0,  gj.  Then  q2  is  added,  producing  {g0>  Qb  to}- There  is  an  e-transition  from  q2 
to  qo,  but  qo  is  already  in  result.  So  the  computation  of  eps(q0 )  halts. 

The  result  of  running  eps  on  each  of  the  states  of  M  is: 

eps(q0)  =  {<?o.<7bto}- 

eps(qi)  =  {<7o.to.to}- 

eps{q2)  =  {<7o.  <?b  to}- 

eps(to)  =  {<?3}- 


Example  5.19  illustrates  clearly  why  we  chose  to  define  the  eps  function,  rather  than 
treating  e-transitions  like  other  transitions  and  simply  following  them  whenever  we  could. 
The  machine  we  had  to  consider  in  that  example  contains  what  we  might  choose  to  call  an 
e-loop:  a  loop  that  can  be  traversed  by  following  only  e-transitions.  Since  such  transitions 
consume  no  input,  there  is  no  limit  to  the  number  of  times  the  loop  could  be  traversed.  So, 
if  we  were  not  careful,  it  would  be  easy  to  write  a  simulation  algorithm  that  did  not  halt. 
The  algorithm  that  we  presented  for  eps  halts  whenever  it  runs  out  of  unvisited  states  to 
add,  which  must  eventually  happen  since  the  set  of  states  is  finite. 

A  Simulation  Algorithm 

With  the  eps  function  in  hand,  we  can  now  define  an  algorithm  for  tracing  all  paths  in 
parallel  through  an  NDFSM  Af: 

ndfsmsimulate  ( Af:  NDFSM,  w :  string)  = 

1.  current-state  =  eps(s).  /*Start  in  the  set  that  contains  M's  start  state 

and  any  other  states  that  can  be  reached 
from  it  following  only  e-transitions. 


74  Chapter  5  Finite  State  Machines 


2.  While  any  input  symbols  in  w  remain  to  be  read  do: 

2.1.  c  -  get-next-symbol(tt’). 

2.2.  next-shite  =  0. 

2.3.  For  each  state  q  in  current-state  do: 

For  each  slate  p  such  that  (</.  c,  /j)  e  A  do: 
next-state  =  next-state  U  eps  (ji). 

2.4.  current-state  =  next-state. 

3.  If  current-state  contains  any  states  in  A.  accept.  Else  reject. 

Step  2.3  is  the  core  of  the  simulation  algorithm.  It  says:  Follow  every  arc  labeled  c  from 
every  state  in  current-state. Then  compute  next-state  (and  thus  the  new'  value  of  current- 
state)  so  that  it  includes  every  state  that  is  reached  in  that  process,  plus  every  state  that 
can  be  reached  by  following  e-transitions  from  any  of  those  stales.  For  more  on  how  this 
step  can  be  implemented,  see  the  more  detailed  description  of  lulfsmsimulate  that  we 
present  in  Section  5.6.2. 

5.4.4  The  Equivalence  of  Nondeterministic  and  Deterministic  FSMs 

In  this  section,  we  explore  the  relationship  between  the  DFSM  and  NDFSM  models 
that  we  have  just  defined. 

THEOREM  5.2  If  There  is  a  DFSM  for  L,  There  is  an  NDFSM  for  L 
Theorem:  For  every  DFSM  there  is  an  equivalent  NDFSM. 

Proof:  Let  M  be  a  DFSM  that  accepts  some  language  /..  Af  is  also  an  NDFSM  that 
happens  to  contain  no  e-transitions  and  whose  transition  relation  happens  to  be  a 
function.  So  the  NDFSM  that  w'e  claim  must  exist  is  simply  Af. 

But  what  about  the  other  direction?  The  nondeterministic  model  that  we  have  just 
introduced  makes  it  substantially  easier  to  build  FSMs  to  accept  some  kinds  of  lan¬ 
guages,  particularly  those  that  involve  looking  for  instances  of  complex  patterns.  But 
real  computers  are  deterministic.  What  does  the  existence  of  an  NDFSM  to  accept  a 
language  L  tell  us  about  the  existence  of  a  deterministic  program  to  accept  L?The  an* 
swer  is  given  by  the  following  theorem: 

THEOREM  5.3  If  There  is  an  NDFSM  for  L,  There  is  a  DFSM  for  L 

Theorem:  Given  an  NDFSM  Af  =  (AC.  2.  A..s.  A)  that  accepts  some  language  L, 
there  exists  an  equivalent  DFSM  that  accepts  L. 

Proof:  The  proof  is  by  construction  of  an  equivalent  DFSM  Af '.  The  construction 
is  based  on  the  function  eps  and  on  the  simulation  algorithm  that  we  de¬ 
scribed  in  the  last  section.  The  slates  of  Af’  will  correspond  to  sets  of  states  in  Af. 
So  M'  =  {K\  2,  fi\  *'./»'),  where: 
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•  K'  contains  one  state  for  each  element  of  9(K). 

•  s'  =  eps(s). 

•  A'  =  {QCK.QOA  *  0}. 

•  S'(Q,  c)  =  U  [eps{p) :  3  q^Q  (( q ,  c,p)  e  A)}. 

We  should  note  the  following  things  about  this  definition: 

•  In  principle,  there  is  one  state  in  K'  for  each  element  of  9(K).  However,  in 
most  cases,  many  of  those  states  will  be  unreachable  from  s'  (and  thus  unnec¬ 
essary).  So  we  will  present  a  construction  algorithm  that  creates  states  only  as 
it  needs  to. 

•  We’ll  name  each  state  in  K'  with  the  element  of  &(K)  to  which  it  corresponds. 
That  will  make  it  relatively  straightforward  to  see  how  the  construction  works. 
But  keep  in  mind  that  those  labels  are  just  names.  We  could  have  called  them 
anything. 

•  To  decide  whether  a  state  in  K'  is  an  accepting  state,  we  see  whether  it  corre¬ 
sponds  to  an  element  of  that  contains  at  least  one  element  of  A,  i.e.,  one 
accepting  state  from  K. 

•  A# '  accepts  whenever  it  runs  out  of  input  and  is  in  a  state  that  contains  at  least 
one  accepting  state  of  M.  Thus  it  implements  the  definition  of  an  NDFSM, 
which  accepts  iff  at  least  one  path  through  it  accepts. 

•  The  definition  of  S'  corresponds  to  step  2.3  of  the  simulation  algorithm  we 
presented  above. 

The  following  algorithm  computes  M '  given  M: 
ndfsmtodfsm(M :  NDFSM)  = 

1.  For  each  state  q  in  K  do: 

Compute  eps(q).  I*  These  values  will  be  used  below. 

2.  s'  =  eps(s). 

3.  Compute  6': 

a.  active-states  =  {s' }.  /*  We  will  build  a  list  of  all  states  that  are  reach¬ 

able  from  the  start  state.  Each  element  of 
active-states  is  a  set  of  states  drawn  from  K. 

b.  5’  =  0. 

c.  While  there  exists  some  element  Q  of  active-states  for  which  S'  has  not 
yet  been  computed  do: 

For  each  character  c  in  2  do: 
new-state  =  0. 

For  each  stale  q  in  Q  do: 
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For  each  state  p  such  that  ( q,c,p )  e  A  do: 
new-state  =  new-stale  U  eps(p). 

Add  the  transition  (Q.c.  new-state)  to  8'. 

If  new-state  €  active-states  then  insert  it  into  active-states. 

4.  K'  =  active-states. 

5.  A'  =  {QeK'.QHA  #  0). 

The  core  of  ndfsmtodfsm  is  the  loop  in  step  3.3.  At  each  step  through  it,  we 
pick  a  state  that  we  know  is  reachable  from  the  start  state  hut  from  which  we 
have  not  yet  computed  transitions.  Call  it  (7.  Then  compute  the  paths  from  Q  for 
each  element  c  of  the  input  alphabet  as  follows:  Q  is  a  set  of  slates  in  the  original 
NDFSM  M.  So  consider  each  element  q  of  Q.  Find  all  transitions  from  q  labeled 
c.  For  each  state  p  that  is  reached  by  such  a  transition,  find  all  additional  states 
that  are  reachable  by  following  only  e-transitions  from  p.  Let  new-state  be  the  set 
that  contains  all  of  those  states.  Now  we  know  that  whenever  M'  is  in  Q  and  it 
reads  a  c,  it  should  go  to  new-state. 

The  algorithm  ndfsmtodfsm  halts  on  all  inputs  and  constructs  a  DFSM  M'  that 
accepts  exactly  L(M).  the  language  accepted  by  M. 


A  rigorous  construction  proof  requires  a  proof  that  the  construction  al¬ 
gorithm  is  correct.  We  will  generally  omit  the  details  of  such  proofs.  But 
we  show  them  for  this  case  as  an  example  of  what  these  proofs  look  like. 
(Appendix  C) 


The  algorithm  ndfsmtodfsm  is  important  for  two  reasons: 

•  It  proves  the  theorem  that,  for  every  NDFSM  there  exists  an  equivalent  DFSM. 

•  It  lets  us  use  nondeterminism  as  a  design  tool,  even  though  we  may  ultimately  need 
a  deterministic  machine.  If  we  have  an  implementation  of  ndfsmtodfsm .  then,  if  we 
can  build  an  NDFSM  to  solve  our  problem,  ndfsmtodfsm  can  easily  construct  an 
equivalent  DFSM. 


EXAMPLE  5.20  Using  ndfsmtodfsm  to  Build  a  Deterministic  FSM 

Consider  the  NDFSM  M  shown  on  the  next  page.  To  get  a  feel  for  M.  simulate  it 
on  the  input  string  bbbacb,  using  coins  to  keep  track  of  the  states  it  enters. 

We  can  apply  ndfsmtodfsm  to  M  as  follows: 

I.  Compute  eps(q)  for  each  state  q  in  KM: 
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eps  (9O  =  {<?i.  92,  97},  eps  (92)  =  {92,  97>.  ePs  (93)  =  {^3>,  eps  ( q 4)  =  {94}. 
eps  (<75)  =  {95}.  epj  (<?6)  =  {92,  ^6, 97},  eps  (97)  =  {97}.  eps  (98)  =  {98}. 

2.  s'  =  eps  (s)  =  {91,92,97}- 

3.  Compute  S': 

active-states  =  {{9lt  92, 97}}-  Consider  {91,  q2 , 97}: 

(({<?i.<?2. 97},  a),  0)- 

(({9i,  92.  $7},  b),{9i,  92, 93, 9s,  97, 9s». 

(({9i.<?2,  97},  0,0). 

active-states  =  {{91, 92, 97}, 0,  {9i, 92, 93, 9s, 97, 9s}}- Consider  0: 

((0,  a),  0).  1*0  is  a  dead  state  and  we  will  generally  omit  it. 

((0,  b),  0). 

((0,0,0). 

active-states  =  {  {9,,  q2, 97}.  0,  {91, 92, 93, 9s,  97, 9s}}-  Consider 
{<7i,  92, 93, 9s,  97, 98}: 

(({9i,  92, 93, 9s,  97, 9s},  a),  {92, 94, 96, 97}). 

(({9i,  92, 93. 9s,  97,  9b},  b),  {9,,  92, 93, 95,  q6, 97, 98}>. 

(({9i.  92. 93, 9s.  97, 9s}.  0,  {94}). 

active-states  =  {  {9!,  q2,  q-,),  0,  {9,,  92, 93, 95, 97, 98},  {92, 94, 96, 97}, 

{9i,  92, 93, 9s«  96, 97. 9s},  {94}}-  Consider  {92, 94, 96, 97}: 

(({92. 94. 96. 97},  a),  0). 
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EXAMPLE  5.20  ( Continued ) 

(({<72.  <74.  </h,  </7}.  b).  {f/v  <75.  </„} ). 

(({</>  </4>  </6-  <7?}.  c),  { </:.  </7} ). 

active-stales  =  {{</,. <72. </7},  0.  {</|.  </2.  <73.  </s. <77. </*}.  {</:•  <74. </»».  </?}. 

{<7i.  <72.  <7.v  <75-  <7<>-  <7?.  </*}■  {<74}.  { <7.v  </<■  </s}.  {</:•  </7}  }• 

Consider  {</,.  </2.  <73.  <75.  </<>.  </7.  </s}: 

(( { (1 1  •  <72-  <73-  <75*  <76-  </7-  </* }  •  a).  { </2.  </4.  </„.  </’ } ). 

(({</|.  <72.  <73*  <7 5.  </h-  </7.  <7h}«  b).  |  </ 1  -  <7 2«  <73.  </s.  <7«-  </7.  </* }  )• 

(( {<7i.  <72.  </3-  </5.  <7h.  </7.  </«}.  c).  { <7>  <74.  <77} ). 

active-states  =  { {<7,. q2 .  </7}.  0.  {</,.  </2.  </t.  <75. </?.  </*}.{ </2.  <74.  </.,•  </? |. {</i. <72. 

<7> <75- <76. </7-  <7s}.  {</4}.  {<7.3. </5. </«}.  {<72. </7},  {</2.  </4. <77}  }•  Consider  {<74}: 

(( {</4}  *  a),  0). 

(({<74}.  b).  0). 

(({<74}.  c),  {<72.  <77}). 

active-states  did  noi  change.  Consider  {</3,  <75.  <7*}: 

(({<73.  <75.  <7k}.  a).  {<72.  <74-  <76.  </7>  )• 

(( { <73.  <75.  <7k } .  b),  { <72.  </„.  <77 }  )• 

(({<73-  <75*  </«}*  c).  { </4 >  )- 

active-states  =  { {<7,. <72, </7}. 0.  {</i. </2. <73- </s. </7- <7k}.  {</:•  </4. </(,. </7|. 

(<7i.<72.<73.  </5.  <76.  </7.  <7x}.{<7j}«  {<7.3.  </s-  </x}.  { </;.  </7 } .  { </:•  <74.  </7 1 .  {<72.  <7«.  <77} }* 
Consider  {<72,  q2}: 

(({<72.  <77}.  a).  0). 

(({<72.  <77}.  b).  {</3.<7s.<7x})- 

(({<72.  <77}.c),0). 

active-states  did  nol  change.  Consider  { </.,  </4.  <77}: 

(({<72* 94.  </7}.  a).  0). 

(( {<72-  </4.  </7}.  b),  {</v  <75,  </K}). 

(({<72.  <74.</?}.c),  {</2«  <77 } )• 
active-states  did  not  change.  Consider  {</2.  <7,,.  <77}: 

(({<72- <76.  <7?}.  a),  0). 

(({</2.<76.<77}.b),  {</3,  <7s.<7k})- 
(({</2.</6.  ^7}.C).  {</2*  </7  }  )• 
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active-states  did  not  change.  6  has  been  computed  for  each  element  of 
active-states. 

4.  K'  —  {{<]{.  {<7l»<?2'<73>05»<77'fl8}.  {92»<?4i06*07}»  {$1»  02' <73* 

<75.  <76-  <77'  <7x}t  {<74}.  {<73'<75-  <7»}*  {<72*  <77}.  {<72'  <74*  <77}’  {<72.  <76.  <77}}- 

5.  A 1  =  { {r/i,  </2.  <73*  <75.  <77.  <7»).  {</i*  <72. 93.  <75.  <76.  <?7.  0s}*  {<73.  </s.  <7k}  }• 


Notice  that,  in  Example  5.20.  the  original  NDFSM  had  8  states.  So  |&(/C)|  =  256. 
There  could  have  been  that  many  states  in  the  DFSM  that  was  constructed  from  the 
original  machine.  But  only  10  of  those  are  reachable  from  the  start  state  and  so  can 
play  any  role  in  the  operation  of  the  machine.  We  designed  the  algorithm  ndfsmtodfsm 
so  that  only  those  10  would  have  to  be  built. 

Sometimes,  however,  all  or  almost  all  of  the  possible  subsets  of  states  are  reachable. 
Consider  again  the  NDFSM  of  Example  5.15.  the  missing  letter  machine.  Let’s  imagine 
a  slight  variant  that  considers  all  26  letters  of  the  alphabet.  That  machine  M  has  27 
states.  So,  in  principle,  the  corresponding  DFSM  could  have  227  stales.  And,  this  time, 
all  subsets  arc  possible  except  that  M  can  not  be  in  the  start  stale.  r/(),  at  any  time  except 
before  the  first  character  is  read.  So  the  DFSM  that  we  would  build  if  we  applied 
ndfsmtodfsm  to  M  would  have  22h  +  1  states.  In  Section  5.6,  we  will  describe  a  tech¬ 
nique  for  interpreting  NDFSMs  without  converting  them  to  DFSMs  first.  Using  that 
technique,  highly  nondetcrministic  machines,  like  the  missing  letter  one,  are  still  practical. 

What  happens  if  we  apply  ndfsnitodfsm  to  a  machine  that  is  already  deterministic? 
It  must  work, since  every  DFSM  is  also  a  legal  NDFSM.  You  may  want  to  try  it  on  one 
of  the  machines  in  Section  5.3.  What  you  will  see  is  that  the  machine  that  ndfsnitodfsm 
builds,  given  an  input  DFSM  M,  is  identical  to  M  except  for  the  names  of  the  states. 

5.5  From  FSMs  to  Operational  Systems 

An  FSM  is  an  abstraction.  We  can  describe  an  FSM  that  solves  a  problem  without  wor¬ 
rying  about  many  kinds  of  implementation  details.  In  fact,  we  don’t  even  need  to  know 
whether  it  will  be  etched  into  silicon  or  implemented  in  software. 


Statecharts,  which  are  based  on  the  idea  of  hierarchically  structured  transi¬ 
tion  networks,  are  widely  used  in  software  engineering  precisely  because 
they  enable  system  designers  to  work  at  varying  levels  of  abstraction.  (H.2) 


FSMs  Tor  real  problems  can  be  turned  into  operational  systems  in  any  of  a  number 
of  ways: 

•  An  FSM  can  be  translated  into  a  circuit  design  and  implemented  directly  in 
hardware.  For  example,  it  makes  sense  to  implement  the  parity  checking  FSM  of 
Example  5.4  in  hardware. 
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•  An  FSM  can  be  simulated  by  a  general  purpose  interpreter.  We  will  describe  de¬ 
signs  for  such  interpreters  in  the  next  section.  Sometimes  all  that  is  required  is  a 
simulation.  In  other  cases,  a  simulation  can  be  used  to  check  a  design  before  it  is 
translated  into  hardware. 

•  An  FSM  can  be  used  as  a  specification  for  some  critical  aspect  of  the  behavior  of  a 
complex  system. The  specification  can  then  be  implemented  in  software  just  as  any 
specification  might  be.  And  the  correctness  of  the  implementation  can  be  shown 
by  verifying  that  the  implementation  satisfies  the  specification  (i.e..  that  it  match¬ 
es  the  FSM). 


Many  network  communication  protocols,  including  the  Alternating  Bit  pro¬ 
tocol  and  TCP.  are  described  as  FSMs.  (1.1) 


5.6  Simulators  for  FSMs  * 

Once  we  have  created  an  FSM  to  solve  a  problem,  we  may  want  to  simulate  its  execu¬ 
tion.  In  this  section,  we  consider  techniques  for  doing  that,  starting  with  DFSMs,  and 
then  extending  our  ideas  to  handle  nondeterminism. 

5.6.1  Simulating  Deterministic  FSMs 

We  begin  by  considering  only  deterministic  FSMs.  One  approach  is  to  think  of  an 
FSM  as  the  specification  for  a  simple,  table-driven  program  and  then  proceed  to  write 
the  code. 


EXAMPLE  5.21  Hardcoding  a  Deterministic  FSM 

Consider  the  following  deterministic  FSM  M  that  accepts  the  language  L  =  { w  e 
{ a,  b }  *  :  w  contains  no  more  than  one  b } . 


We  could  view  M  as  a  specification  for  the  following  program: 
Until  accept  or  reject  do: 

5:  s  -  get-next-symbol. 

If  s  =  end-of-file  then  accept. 

Else  if  s  =  a  then  go  to  S. 

Else  if  .v  =  b  then  go  to  T. 
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T:  s  =  get-next-symbol. 

If  s  =  end-of-file  then  accept. 
Else  if  s  =  a  then  go  to  T. 

Else  if  s  =  b  then  reject. 

End. 


Given  an  FSM  M  with  states  K ,  this  approach  will  create  a  program  of  length 
=  2  +  (|K|*(|2|  +  2)).  The  time  required  to  analyze  an  input  string  w  is 
C>(|w|  •  |S|).  The  biggest  problem  with  this  approach  is  that  we  must  generate  new 
code  for  every  FSM  that  we  wish  to  run.  Of  course,  we  could  write  an  FSM  compiler 
that  did  that  for  us.  But  we  don't  need  to.  We  can,  instead,  build  an  interpreter  that  ex¬ 
ecutes  the  FSM  directly. 

Here's  a  simple  interpreter  for  a  deterministic  FSM  M  =  {K,  2, 5, s,  /4): 
dfsmsimulute{ M:  DFSM,  to:  string)  = 

1.  st  =  s. 

2.  Repeat: 

2.1.  c  =  get-next-symbol( w). 

2.2.  If  c  *  end-of-file  then: 

2.2.1.  st  -  5 (st,  c). 

until  c  =  end-of-file. 

3.  If  st  e  A  then  accept  else  reject. 

The  algorithm  dfsmsimulate  runs  in  time  approximately  0(|u|),  if  we  assume  that 
the  lookup  in  step  2.2.1  can  be  implemented  in  constant  time. 

5.6.2  Simulating  Nondeterministic  FSMs 

Now  suppose  that  we  want  to  execute  an  NDFSM  M.  One  solution  is: 
nilfsniconveriandsimulate(M :  NDFSM)  = 
d  fsmsinud  ate(  nd  fsmtodfsm(M)). 

But,  as  we  saw  in  Section  5.4.  converting  an  NDFSM  to  a  DFSM  can  be  very  ineffi¬ 
cient  in  terms  of  both  time  and  space.  If  M  has  k  states,  it  could  take  time  and  space 
equal  to  0(2*  )  just  to  do  the  conversion .  although  the  simulation,  after  the  conversion, 
would  lake  time  equal  to  0(|w|).  So  we  would  like  a  better  way.  We  would  like  an 
algorithm  that  directly  simulates  an  NDFSM  M  without  converting  it  to  a  DFSM  first. 

We  sketched  such  an  algorithm  ndfsmsimulate  in  our  discussion  leading  up  to  the 
definition  of  the  conversion  algorithm  ndfsnuodfsm.  The  idea  is  to  simulate  being  in 
sets  of  stales  at  once.  But,  instead  of  generating  all  of  the  reachable  sets  of  states  right 
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away,  as  ndfsmtodfsm  does,  it  generates  them  on  the  fly,  as  they  are  needed,  being  care* 
ful  not  to  get  stuck  chasing  E-Ioops. 

We  give  here  a  more  detailed  description  of  mlfsmsiinulaie,  which  simulates  an 
NDFSM  M  =  (K.  2,  A,  s.  A )  running  on  an  input  string  w: 

ndfsmsimulate(M :  NDFSM,  w:  string)  = 

1.  Declare  the  set  st.  I*  si  will  hold  the  current  state  (a  set  of  slates  from  K). 

2.  Declare  the  set  s/1.  I*  s/1  will  be  built  to  contain  the  next  state. 

3.  st  =  eps(s).  I*  Start  in  all  states  reachable  front  s  via  only  e-transilions. 

4.  Repeat: 

c  =  get-next-symbol  (ut). 

If  c  *  end-of-file  then  do: 
stl  =  0. 

For  all  q  e  st  do: 

For  all  r :  (</,  c,  r)e  A  do: 

stl  =  stl  U  cps(r). 

st  =  stl. 

If  sf  =  0  then  exit. 

until  c  =  end-of-file. 

5.  If  st  H  A  *  0  then  accept  else  reject. 

Now  there  is  no  conversion  cost.  To  analyze  a  string  *r  requires  \w\  passes  through 
the  main  loop  in  step  4.  In  the  worst  case.  M  is  in  all  states  all  the  time  and  each  of  them 
has  a  transition  to  every  other  one.  So  one  pass  could  take  as  many  as  C^(  |  AsTp)  steps,  for 
a  total  cost  of  |K|2). 

There  is  also  a  third  way  we  could  build  a  simulator  for  an  NDFSM.  We  could  build 
a  depth-first  search  program  that  examines  the  paths  through  M  and  stops  whenever 
either  it  finds  a  path  that  accepts  or  it  has  tried  all  the  paths  there  are. 

5.7  Minimizing  FSMs  • 

If  we  are  going  to  solve  a  real  problem  with  an  FSM.  we  may  want  to  find  the  smallest 
one  that  does  the  job.  We  will  say  that  a  DFSM  M  is  minimal  iff  there  is  no  other 
DFSM  M‘  such  that  L{M)  =  L(M')  and  M'  has  fewer  states  than  M  does. 

We  might  want  to  be  able  to  ask: 

1.  Given  a  language,  L.is  there  a  minimal  DFSM  that  accepts  /.? 

2.  If  there  is  a  minimal  machine,  is  it  unique? 

3.  Given  a  DFSM  M  that  accepts  some  language  L.can  we  tell  whether  M  is  minimal? 

4.  Given  a  DFSM  M.  can  we  construct  a  minimal  equivalent  DFSM  A/'? 


I*  Follow  paths  from  all  states  M  is 
currently  in. 

/*  Find  all  states  reachable  from  q 
via  a  transition  labeled  c. 

/*  Follow  all  E-transilions  from  there. 
/*  Done  following  all  paths.  So  st  be¬ 
comes  M's  new  state. 

I*  If  all  paths  have  died.  quit. 
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The  answer  to  all  four  questions  is  yes.  We’ll  consider  questions  1  and  2  first,  and  then 
consider  questions  3  and  4. 


5.7.1  Building  a  Minimal  DFSM  for  a  Language 

Recall  that  in  Section  5.3  we  suggested  that  an  effective  way  to  think  about  the  design 
of  a  DFSM  M  to  accept  some  language  L  over  an  alphabet  2  is  to  cluster  the  strings  in 
2*  in  such  a  way  that  strings  that  share  a  future  will  drive  M  to  the  same  state.  We  will 
now  formalize  that  idea  and  use  it  as  the  basis  for  constructing  a  minimal  DFSM  to  ac¬ 
cept  L. 

We  will  say  that  x  and  y  are  indistinguishable  with  respect  to  L,  which  we  will  write 
as  x  «Ly  iff: 


Vz  e  2*  (either  both  jcz  and  yz  e  L  or  neither  is). 

In  other  words,  ~£.  is  a  relation  that  is  defined  so  that  jc  y  precisely  in  case,  if  jc  and 
y  are  viewed  as  prefixes  of  some  longer  string,  no  matter  what  continuation  string  z 
comes  next,  either  both  xz  and  yz  are  in  L  or  both  are  not. 


EXAMPLE  5.22  How  «L  Depends  on  L 

IfL  =  {a}*,  then  a  aa  aaa.  But  if  L  =  {we  {a,b}*  :  \w\  is  even},  then 
a  aaa,  but  it  is  not  the  case  that  a  aa  because,  if  z  =  a,  we  have  aa  e  L 
but  aaa  e  L. 


We  will  say  that  x  and  y  are  distinguishable  with  respect  to  L,  iff  they  are  not  indis¬ 
tinguishable.  So,  if  x  and  y  are  distinguishable,  then  there  exists  at  least  one  string  z 
such  that  one  but  not  both  of  xz  and  yz  is  in  L. 

Note  that  is  an  equivalence  relation  because  it  is: 

•  Reflexive:  Vx  e  2*  (x  x),  because  Vx,  z  e  2*  (xz  e  L  +*  xz  e  L). 

•  Symmetric:  Vx,  y  e  2*  (x  y  -*  y  x),  because  Vx,  y,  z  e  2*  ((xz  eL+-*yzeL) 
«-*  (yz  eL++xze  L)). 

•  Transitive:  Vx,  y,  z  e  2*  (((x  y)A(y  w))  —  (x  w)),  because: 

Vx.  y,  z  e  2*  (((xz  e  L  ~yz  e  L)A(yz  e  L  ~  wz  e  L))  —  (xz  e  L  wz  e  L)). 

We  will  use  three  notations  to  describe  the  equivalence  classes  of  ~ l- 

•  [1  ],  [2],  etc.  will  refer  to  explicitly  numbered  classes. 

•  [x]  describes  the  equivalence  class  that  contains  the  string  x. 

•  [some  logical  expression  P\  describes  the  equivalence  class  of  strings  that  satisfy  P. 
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Since  is  an  equivalence  relation,  its  equivalence  classes  constitute  a  partition  of 
the  set  2*.  So: 

•  No  equivalence  class  of  « /  is  empty,  and 

•  Every  string  in  2*  is  in  exactly  one  equivalence  class  of  *sL. 

What  we  will  see  soon  is  that  the  equivalence  classes  of  *=  /  correspond  exactly  to 
the  states  of  the  minimum  DFSM  that  accepts  L.  So  even'  string  in  2*  will  drive  that 
DFSM  to  exactly  one  state. 

Given  some  language  L,  how  can  we  determine  =s/  ?  Any  pair  of  strings  x  and  y  are 
related  via  unless  there  exists  some  z  that  could  follow  them  and  cause  one  to  be 
in  L  and  the  other  not  to  be.  So  it  helps  to  begin  the  analysis  by  considering  simple 
strings  and  seeing  whether  they  are  distinguishable  or  not.  One  way  to  start  this 
process  is  to  begin  lexicographically  enumerating  the  strings  in  2*  and  continue  until 
a  pattern  has  emerged. 


EXAMPLE  5.23  Determining  «L 

Let  2  =  {a.b}.  Let  L  =  {iue  2*  :  every  a  is  immediately  followed  by  a  b}. 

To  determine  the  equivalence  classes  of  we  begin  by  creating  a  First  class 

[1]  and  arbitrarily  assigning  e  to  it.  Now  consider  a.  It  is  distinguishable  from  s 
since  cab  e  L  but  aab  £  L.  So  we  create  a  new  equivalence  class  (2]  and  put  a  in 
it.  Now  consider  b.  b  e  since  every  string  is  in  L  unless  it  has  an  a  that  is  not  fol¬ 
lowed  by  a  b.  Neither  of  these  has  an  a  that  could  have  that  problem.  So  they  are 
both  in  L  as  long  as  their  continuation  doesn't  violate  the  rule.  If  their  continua¬ 
tion  does  violate  the  rule,  they  are  both  out.  So  b  goes  into  [1], 

Next  we  try  aa.  It  is  distinguishable  from  the  strings  in  [  1  ]  because  the  strings  in 
[1]  are  in  L  but  aa  is  not.  So,  consider  e  as  a  continuation  string.  Take  any  string  in 
[1]  and  concatenate  e.  The  result  is  still  in  L.  But  aae  is  not  in  L.  We  also  notice 
that  aa  is  distinguishable  from  a,  and  so  cannot  be  in  (2).  because  a  still  has  a 
chance  to  become  in  L  if  it  is  followed  by  a  string  that  starts  with  a  b.  But  aa  is  out, 
no  matter  what  comes  next.  We  create  a  new  equivalence  class  [3]  and  put  aa  in  it 
We  continue  in  this  fashion  until  we  discover  the  property  that  holds  of  each 
equivalence  class. 

The  equivalence  classes  of  « L  are: 

[1]  [e,  b,  abb, . . .  ]  [all  strings  in  L). 

[2]  [a,  abbba,  . . .  ]  [all  strings  that  end  in  a  and  have  no  prior 

a  that  is  not  followed  by  a  b]. 

[3]  [aa,  abaa,  . . .  ]  [all  strings  that  contain  at  least  one  in¬ 

stance  of  aal. 
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Even  this  simple  example  illustrates  three  key  points  about 

No  equivalence  class  can  contain  both  strings  that  are  in  L  and  strings  that  are  not. 
This  is  clear  if  we  consider  the  continuation  string  e.  If  x  e  L  then  xe  e  L.  If  y  t  L 
then  ye  tL.  Sox  and y  are  distinguishable  by  e. 

If  there  are  strings  that  would  take  a  DFSM  for  L  to  the  dead  state  (in  other  words, 
strings  that  are  out  of  L  no  matter  what  comes  next),  then  there  will  be  one  equiv¬ 
alence  class  of  that  corresponds  to  the  dead  state. 

Some  equivalence  class  contains  e.  It  will  correspond  to  the  start  state  of  the  mini¬ 
mal  machine  that  accepts  L. 


EXAMPLE  5.24  When  More  Than  One  Class  Contains  Strings  in  L 

Let  2  =  [a,  b}.  Let  L  =  {u;e{a,  b}* :  no  two  adjacent  characters  are  the 
same}. The  equivalence  classes  of  are: 


[1] 

M 

W. 

12] 

[a,  aba,  ababa,  . . .  ] 

[all  nonempty  strings  that  end  in  a 
and  have  no  identical  adjacent 
characters]. 

[3] 

[b,  ab,  bab,  abab, . . .  ] 

[all  nonempty  strings  that  end  in  b 
and  have  no  identical  adjacent 
characters]. 

[4] 

[aa, abaa, ababb  ...] 

[all  strings  that  contain  at  least 
one  pair  of  identical  adjacent 
characters]. 

From  this  example,  we  make  one  new  observation  about  « L : 

•  While  no  equivalence  class  may  contain  both  strings  that  are  in  L  and  strings  that  are 
not,  there  may  be  more  than  one  equivalence  class  that  contains  strings  that  are  in 
L.  For  example,  in  this  last  case,  all  the  strings  in  classes  (1),  (2],  and  (3]  are  in  L.  Only 
those  that  are  in  [4],  which  corresponds  to  the  dead  state,  are  not  in  L.  That  is  be¬ 
cause  of  the  structure  of  L:  Any  string  is  in  L  until  it  violates  the  rule,  and  then  it  is 
hopelessly  out. 

Does  «/.  always  have  a  finite  number  of  equivalence  classes?  It  has  in  the  two  ex¬ 
amples  we  have  considered  so  far.  But  let’s  consider  another  one. 
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EXAMPLE  5.25  «LforA"Bn 

Let  S  =  {a.bj.Let  L  =  A"Bn  =  {a"b":/i  ()}. 

We  can  begin  constructing  the  equivalence  classes  of  *sL; 

m  14 

[2]  la]. 

[3]  [aa]. 

[4]  [aaa]. 

But  we  seem  to  be  in  trouble.  Each  new  string  of  a's  has  to  go  in  an  equivalence 
class  distinct  from  the  shorter  strings  because  each  siring  requires  a  different  con¬ 
tinuation  string  in  order  to  become  in  L.  So  the  set  of  equivalence  classes  of 
must  include  at  least  all  of  the  following  classes: 

{[//] :  n  is  a  positive  integer  and  |/t|  contains  the  single  siring  a"-1} 

Of  course,  classes  that  include  strings  that  contain  b's  are  also  required. 


So.  if  L-  AnBn,  then  ~/  has  an  infinite  number  of  equivalence  classes.  This  should 
come  as  no  surprise.  AnBn  is  not  regular,  as  we  will  prove  in  Chapter  S.  If  the  equiva¬ 
lence  classes  of  are  going  to  correspond  to  the  states  of  a  machine  to  accept  L,  then 
there  will  be  a  finite  number  of  equivalence  classes  precisely  in  case  L  is  regular. 

We  are  now  ready  to  talk  about  DFSMs  and  to  examine  the  relationship  between 

and  any  DFSM  that  accepts  L.To  help  do  that  we  will  say  that  a  slate  q  of  a  DFSM 
M  contains  the  set  of  strings  s  such  that  M.  when  started  in  its  start  slate,  lands  in  q 
after  reading  .v. 

THEOREM  5.4  Imposes  a  Lower  Bound  on  the  Minimum  Number  of 
States  of  a  DFSM  for  L 

Theorem:  Let  L  be  a  regular  language  and  let  M  =  (K.  2L.fi.  ,v.  A )  be  a  DFSM  that 
accepts  L.  The  number  of  slates  in  M  is  greater  than  or  equal  to  the  number  of 
equivalence  classes  of  «=/  . 

Proof:  Suppose  that  the  number  of  states  in  M  w-ere  less  than  the  number  of  equiv¬ 
alence  classes  of  %/  .Then. by  the  pigeonhole  principle,  there  must  be  at  least  one 
stale  q  that  contains  strings  from  at  least  two  equivalence  classes  of  =*/..  But  then 
AT s  future  behavior  on  those  strings  will  be  identical,  which  is  not  consistent  with 
the  fact  that  they  are  in  different  equivalence  classes  of 

So  now  we  know  a  lower  bound  on  the  number  of  states  that  are  required  to  build 
an  FSM  to  accept  a  language  L.  But  is  it  always  possible  to  find  a  DFSM  M  such  that 
\Km\  is  exactly  equal  to  the  number  of  equivalence  classes  of  =s/t?  The  answer  is  yes. 
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THEOREM  5.5  There  Exists  a  Unique  Minimal  DFSM  for  Every  Regular 
Language 

Theorem:  Let  L  be  a  regular  language  over  some  alphabet  2.  Then  there  is  a 

DFSM  M  that  accepts  L  and  that  has  precisely  n  states  where  n  is  the  number  of 

equivalence  classes  of  Any  other  DFSM  that  accepts  L  must  either  have 

more  states  than  M  or  it  must  be  equivalent  to  M  except  for  state  names. 

Proof:  The  proof  is  by  construction  of  M  =  (AC.  2, 6,  s,  A ),  where: 

•  K  contains  n  states,  one  for  each  equivalence  class  of  =sL. 

•  s  =  (ej,  the  equivalence  class  of  e  under  =/.. 

•  A  =  {(a-)  :.te  L). 

•  8([.v],  a)  =  [.v«].  In  other  words,  if  M  is  in  the  state  that  contains  some  string  x , 
then,  after  reading  the  next  symbol  a,  it  will  be  in  the  state  that  contains  xa. 

For  this  construction  to  prove  the  theorem,  we  must  show: 

•  K  is  finite.  Since  L  is  regular,  it  is  accepted  by  some  DFSM  M'.  M'  has  some 
finite  number  of  states  m.  By  Theorem  5.4,  n  ^  m.  So  K  is  finite. 

•  8  is  a  function.  In  other  words,  it  is  defined  for  all  (stale,  input)  pairs  and  it 
produces,  for  each  of  them,  a  unique  value. The  construction  defines  a  value  of 
8  for  all  (state,  input)  pairs. The  fact  that  the  construction  guarantees  a  unique 
such  vuluc  follows  from  the  definition  of 

•  L  =  L[M).  In  other  words,  M  does  in  fact  accept  the  language  L.To  prove 
this,  we  must  first  show  that  Vs.  t  (([e],  st)  |-w*  ((a],  f)).  In  other  words, 
when  M  starts  in  its  start  state  and  has  a  string  that  we  are  describing  as 
having  two  parts,  s  and  f,  to  read,  it  correctly  reads  the  first  part  s  and  lands 
in  the  state  [s],  with  t  left  to  read.  We  do  this  by  induction  on  |s|.  If  |s|  =  0 
then  we  have  ([e],  e  /)  |-w*  ([e],  r),  which  is  true  since  M  simply  makes 
zero  moves.  Assume  that  the  claim  is  true  if  |s|  =  k.  Then  we  consider 
what  happens  when  |s|  =  k  +  1.  |s|  s  1,  so  we  can  lets  =  yc  where  ye  2* 
and  ce  2.  We  have: 

/*  M  reads  the  first  k  characters: 

((e],yt7)  \~m*  ([>').  cl)  (induction  hypothesis, since  ly|  =  k). 

I*  M  reads  one  more  character: 

(Lyl- ct)  I  ~m*  (Lvf]«  0  (definition  of  5M). 

I*  Combining  those  two,  after  M  has  read  k  +  1  characters: 

(lej,  yet)  |-M*  ([yc],  t)  (transitivity  of  | -***)• 

([®1«*0  l~Af*  ([«],  t)  (definition  of  s  as  yc). 
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Now  let  t  be  e.  (In  other  words,  we  are  examining  Afs  behavior  after  it 
reads  its  entire  input  string.)  Let  s  be  any  siring  in  £*.  By  the  claim  we  just 
proved,  ((e],s)  |-w*  ([.vj.  e).  M  will  accept  s  iff  [,v|  e  A.  which,  by  the  way  in 
which  A  was  constructed,  it  will  be  if  the  strings  in  [.v]  are  in  L.  So  M  accepts 
precisely  those  strings  that  are  in  M. 

•  There  exists  no  smaller  machine  Af#  that  also  accepts  /..This  follows  directly 
from  Theorem  5.4,  which  says  that  the  number  of  equivalence  classes  of 
imposes  a  lower  bound  on  the  number  of  states  in  any  DFSM  that  accepts  L. 

•  There  is  no  different  machine  Af#  that  also  has  n  states  and  that  accepts  L. 
Consider  any  DFSM  Af#  with  n  states.  We  show  that  either  A/#  is  identical  to 
M  (up  to  state  names)  or  L(Af#)  #  L(M  ). 

Since  we  do  not  care  about  state  names,  we  can  standardi/.e  them.  Call 
the  start  state  of  both  M  and  Af#  state  1.  Define  a  lexicographic  ordering 
on  the  elements  of  1.  Number  the  rest  of  the  slates  in  both  M  and  Af#  as 
follows: 

Until  all  states  have  been  numbered  do: 

Let  q  be  the  lowest  numbered  state  from  which  there  are  transitions 
that  lead  to  an  as  yet  unnumbered  state. 

List  the  transitions  that  lead  out  from  q  to  any  unnumbered  slate.  Sort 
those  transitions  lexicographically  by  the  symbol  on  them. 

Go  through  the  sorted  transitions  (q,  a,  p ).  in  order,  and.  for  each,  assign 
the  next  unassigncd  number  to  state  p. 

Note  that  Af#  has  n  states  and  there  are  n  equivalence  classes  of  ~L.  Since 
none  of  those  equivalence  classes  is  empty  (by  the  definition  of  equivalence 
classes),  Af#  either  wastes  no  states  (i.e.,  every  state  contains  at  least  one  string) 
or,  if  it  does  waste  any  slates,  it  has  at  least  one  state  that  contains  strings  in  dif¬ 
ferent  equivalence  classes  of  «/..  If  the  latter,  then  /.(A/#)  #  L.  So  we  assume 
the  former.  Now  suppose  that  Af#  is  different  from  Af.Then  there  would  have  to 
be  at  least  one  state  q  and  one  input  symbol  c  such  that  Af  has  a  transition  ( q ,  c,  r) 
and  Af#  has  a  transition  (q,  c,  t)  and  r  ?  I.  Call  the  set  of  strings  that  r  contains 
[r].  Since  Af#  has  no  unused  states  (i.e.,  slates  that  contain  no  strings),  by  the  pi¬ 
geonhole  principle,  Af#’s  transition  (q,  c,  t)  must  send  some  string  s  in  (rj  to  a 
state,  /,  that  also  contains  strings  that  are  not  in  [r].  All  strings  in  [f]  will  then 
share  all  futures  with  s.  But  s  is  distinguishable  from  the  strings  in  [r|.  If  two 
strings  that  are  distinguishable  with  respect  to  L  share  all  futures  in  Af#,  then 
L(Af#)  *  L.  Contradiction. 


The  construction  that  we  used  to  prove  Theorem  5.5  is  useful  in  its  own  right: 
We  can  us  it,  if  we  know  to  construct  a  minimal  DFSM  for  /.. 
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EXAMPLE  5.26  Building  a  Minimal  DFSM  from 

We  consider  again  the  language  of  Example  5.24:  Let  2  =  {a,  b}.  Let  L  =  {rue 
{a,  b}* :  no  two  adjacent  characters  are  the  same}. 

The  equivalence  classes  of  ~L  are: 


m 

[«] 

{e}. 

[2] 

[a,  aba,  ababa,  . . .  ] 

{all  nonempty  strings  that  end  in  a 
and  have  no  identical  adjacent 
characters}. 

[3] 

[b,  ab,  bab,  abab,  . . .  ] 

{all  nonempty  strings  that  end  in  b 
and  have  no  identical  adjacent 
characters}. 

[4] 

[aa, abaa, ababb  ...] 

{all  strings  that  contain  at  least  one 
pair  of  identical  adjacent  characters; 
these  strings  are  not  in  L,  no  matter 
what  comes  next}. 

We  build  a  minimal  DFSM  M  to  accept  L  as  follows: 

•  The  equivalence  classes  of  become  the  states  of  M. 

•  The  start  state  is  [e]  =  [1]. 

•  The  accepting  states  are  all  equivalence  classes  that  contain  strings  in  L,  namely 
[1],  [2],  and  (3). 

•  8(M»  a)  =  (xaj.  So,  for  example,  equivalence  class  [1]  contains  the  string  e. 
If  the  character  a  follows  e,  the  resulting  string,  a,  is  in  equivalence  class  [2]. 
So  we  create  a  transition  from  [1]  to  [2]  labeled  a.  Equivalence  class  [2}  con¬ 
tains  the  string  a.  If  the  character  b  follows  a,  the  resulting  string,  ab,  is  in 
equivalence  class  [3].  So  we  create  a  transition  from  [2]  to  [3]  labeled  b.  And 
so  forth. 


a.b 
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The  fact  that  it  is  always  possible  to  construct  a  minimum  DFSM  M  to  accept  any  lan¬ 
guage  L  is  good  news.  As  we  will  see  later,  the  fact  that  that  minimal  DFSM  is  unique  up 
to  slate  names  is  also  useful.  In  particular,  we  will  use  it  as  a  basis  for  an  algorithm  that 
checks  two  DFSMs  to  see  if  they  accept  the  same  language.  The  theorem  that  we  have 
just  proven  is  also  useful  because  it  gives  us  an  easy  way  to  prove  the  following  result, 
which  goes  by  two  names,  Nerode’s  theorem  and  the  Myhill-Nerode  theorem. 

THEOREM  5.6  Myhill-Nerode  Theorem  _ __ 

Theorem:  A  language  is  regular  iff  the  number  of  equivalence  classes  of  =/.  is  finite. 

Proof:  We  do  two  proofs  to  show  the  two  directions  of  the  implication: 

L  regular  — *  the  number  of  equivalence  classes  of  Is  finite :  If  L  is  regular,  then 
there  exists  some  DFSM  M  that  accepts  L.  M  has  some  finite  number  of  states  m.  By 
Theorem  5.4,  the  number  of  equivalence  classes  of  ^  m.  So  the  number  of 
equivalence  classes  of  =^/  is  finite. 

The  number  of  equivalence  classes  of  is  finite  — *  L  regular  If  the  number  of 
equivalence  classes  of  &/  is  finite,  then  the  construction  that  was  described  in  the 
proof  of  Theorem  5.5  will  build  a  DFSM  that  accepts  L.  So  L  must  be  regular. 


The  Myhill-Nerode  theorem  gives  us  our  first  technique  for  proving  that  a  language 
L,  such  as  AnBn.  is  not  regular.  It  suffices  to  show  that  has  an  infinite  number  of 
equivalence  classes.  But  using  the  Myhill-Nerode  theorem  rigorously  is  difficult.  In 
Chapter  8,  we  will  introduce  other  methods  that  are  harder  to  use  incorrectly. 

5.7.2  Minimizing  an  Existing  DFSM 

Now  suppose  that  we  already  have  a  DFSM  M  that  accepts  L.  In  fact,  possibly  M  is  the 
only  definition  we  have  of  L.  In  this  case,  it  makes  sense  to  construct  a  minimal  DFSM 
to  accept  L  by  starting  with  M  rather  than  with  There  arc  two  approaches  that  we 
could  take  to  constructing  a  minimization  algorithm: 

1.  Begin  with  M  and  collapse  redundant  states,  getting  rid  of  one  at  a  lime  until  the 
resulting  machine  is  minimal. 

2.  Begin  by  overclustering  the  states  of  L  into  just  two  groups,  accepting  and  nonac¬ 
cepting.  Then  iteratively  split  those  groups  apart  until  all  the  distinctions  that  L 
requires  have  been  made. 

Both  approaches  work.  We  will  present  an  algorithm  that  lakes  the  second  one. 

Our  goal  is  to  end  up  with  a  minimal  machine  in  which  all  equivalent  stales  of  M 
have  been  collapsed.  In  order  to  do  that,  we  need  a  precise  definition  of  what  it  means 
for  two  states  to  be  equivalent  (and  thus  collapsible).  We  will  use  the  following: 

We  will  say  that  two  states  q  and  p  in  M  are  equivalent,  which  we  will  write  q  m  p% 
iff  for  all  strings  w  e  2*.  either  w  drives  M  to  an  accepting  slate  from  both  q  and  p  or  it 
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drives  M  to  a  rejecting  state  from  both  q  and  p.  In  other  words,  no  matter  what  contin¬ 
uation  string  comes  next,  M  behaves  identically  from  both  states.  Note  that  =  is  an 
equivalence  relation  over  states,  so  it  will  partition  the  states  of  M  into  a  set  of  equiva¬ 
lence  classes. 


EXAMPLE  5.27  A  Nonminimal  DFSM  with  Two  Equivalent  States 

Let  2  =  {a,  b}.  Let  L  =  {we  2* :  |w|  is  even}.  Consider  the  following  FSM 
that  accepts  L: 


In  this  machine  state  q2  =  state  q$. 


For  two  states  q  and  p  to  be  equivalent,  they  must  yield  the  same  outcome  for  all 
possible  continuation  strings.  We  can't  claim  an  algorithm  for  finding  equivalent  states 
that  works  by  trying  all  possible  continuation  strings  since  there  is  an  infinite  number 
of  them  (assuming  that  2  is  not  empty).  Fortunately,  we  can  show  that  it  is  necessary  to 
consider  only  a  finite  subset  of  them.  In  particular,  we  will  consider  them  one  character 
at  a  time,  and  quit  when  considering  another  character  has  no  effect  on  the  machine  we 
are  building. 

We  define  a  series  of  equivalence  relations  ■*”,  for  values  of  n  s  0.  For  any  two 
states  p  and  q,p  q  iff  p  and  q  yield  the  same  outcome  for  all  strings  of  length  n.  So: 

•  P  m  0  Q  iff  'hey  behave  equivalently  when  they  read  c.  In  other  words,  if  they  are 
both  accepting  or  both  rejecting  states. 

•  p  * 1  iff  they  behave  equivalently  when  they  read  any  string  of  length  1.  In  other 
words,  if  any  single  character  sends  both  of  them  to  an  accepting  state  or  both  of 
them  to  a  rejecting  state.  Note  that  this  is  equivalent  to  saying  that  any  single  char¬ 
acter  sends  them  to  states  that  are  =°  to  each  other. 

•  p  * 2  q  iff  they  behave  equivalently  when  they  read  any  string  of  length  2,  which 
they  will  do  if,  when  they  read  the  first  character  they  land  in  states  that  are  s  1  to 
each  other.  By  the  definition  of  =  \  they  will  then  yield  the  same  outcome  when 
they  read  the  single  remaining  character. 

•  And  so  forth. 
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We  can  state  this  definition  concisely  as  follows.  For  all  p.  q  e  K: 

•  p  q  iff  they  are  both  accepting  or  both  rejecting  stales. 

•  For  all /i  s  \.q  p  iff: 

•  q  a"~l  p,  and 

•  Vr/e  !(8{p,a)  s"'1  8(q,a)). 

We  will  define  minDFSM.  a  minimization  algorithm  that  lakes  as  its  input  a  DFSM 
M  =  (/C,  2,  8,  s,  A ).  MinDFSM  will  construct  a  minimal  DFSM  M'  that  is  equivalent 
to  M.  It  begins  by  constructing  =‘\  which  divides  the  slates  of  M  into  at  most  two 
equivalence  classes,  corresponding  to  A  and  K  -  A.  If  M  has  no  accepting  states  or  if  all 
its  states  are  accepting,  then  there  will  be  only  one  nonempty  equivalence  class  and  we 
can  quit  since  there  is  a  onc-state  machine  that  is  equivalent  to  M.  We  consider  there¬ 
fore  only  those  cases  where  both  A  and  K  -  A  are  nonempty. 

MinDFSM  executes  a  sequence  of  steps,  during  which  it  constructs  the  sequence  of 
equivalence  relations  . ...  To  construct  *.  min!)  ISM  begins  with  **• 

But  then  it  splits  equivalence  classes  of  ■*  whenever  it  discovers  some  pair  of  states 
that  do  not  behave  equivalently.  MinDFSM  halts  when  it  discovers  that  ="  is  the  same 
as  ="*  *.  Any  further  steps  would  operate  on  the  same  set  of  equivalence  classes  and  so 
would  also  fail  to  find  any  stales  that  need  to  be  split. 

We  can  now  state  the  algorithm: 

minDFSM{M:  DFSM)  = 

1.  classes  =  { A .  K-A }.  /*  Initially,  just  two  classes  of  states,  accepting  and 

rejecting. 

2.  Repeat  until  a  pass  at  which  no  change  to  classes  has  been  made: 

2.1.  newclasses  =  0.  /*  At  each  pass,  we  build  a  new  set  of  classes,  splitting 

the  old  ones  as  necessary.  Then  this  new  set  be¬ 
comes  the  old  set,  and  the  process  is  repealed. 

2.2.  For  each  equivalence  class  e  in  classes,  if  c  contains  more  than  one  stale,  see 
if  it  needs  to  be  split: 

For  each  state  q  in  e  do:  (*  Look  at  each  state  and  build  a  table  of 

what  it  does.  Then  the  tables  for  all 
states  in  the  class  can  be  compared  to 
see  if  there  are  any  differences  that 
force  splitting. 

For  each  character  c  in  2!  do: 

Determine  which  element  of  classes  q  g«>es  to  if  c  is  read. 

If  there  are  any  two  stales  p  and  q  such  that  there  is  any  character  c  such 
that,  when  c  is  read./?  goes  to  one  element  of  classes  and  q  goes  to 
another,  then  p  and  q  must  be  split.  Create  as  many  new  equivalence 
classes  as  are  necessary  so  that  no  stale  remains  in  the  same  class 
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with  a  stale  whose  behavior  differs  from  its.  Insert  those  classes  into 
newclasses. 

If  there  are  no  states  whose  behavior  differs,  no  splitting  is  necessary.  In¬ 
sert  e  into  newclasses. 

23.  classes  =  newclasses. 

I*  The  states  of  the  minimal  machine  will  correspond  exactly  to  the  elements  of 
classes  at  this  point.  We  use  the  notation  [<7]  for  the  element  of  classes  that  contains 
the  original  state  q. 

3.  Return  M'  =  ( classes ,  X,  8,  [sw],  {[<7:  the  elements  of  q  are  in  AM]}),  where  bM' 
is  constructed  as  follows: 

if  8m  (</,  c)  =  p.  then  8M  ■  ([</],  c)  =  [ p\. 

Clearly,  no  class  that  contains  a  single  state  can  be  split.  So,  if  | K  |  is  k ,  then  the  max¬ 
imum  number  of  times  that  minDFSM  can  split  classes  is  Ac  —  1.  Since  minDFSM  halts 
when  no  more  splitting  can  occur,  the  maximum  number  of  limes  it  can  go  through  the 
loop  is  k  -  1,  Thus  minDFSM  must  halt  in  a  finite  number  of  steps.  M'  is  the  minimal 
DFSM  that  is  equivalent  to  M  since: 

•  M'  is  minimal:  It  splits  classes  and  thus  creates  new  states  only  when  necessary  to 
simulate  M,  and 

•  L{M')  =  L(M ):  The  proof  of  this  is  straightforward  by  induction  on  the  length  of 
the  input  string. 


EXAMPLE  5.28  Using  minDFSM  to  Find  a  Minimal  Machine 

Let  2  =  {a,  b}.  Let  M  = 


We  will  show  the  operation  of  minDFSM  at  each  step: 
Initially,  classes  =  {[2, 4],  [1, 3, 5, 6]}. 

At  step  1: 

((2,  a),  [1,3, 5, 6]) 

((2,  b),  [1,3, 5, 6]) 


((4,  a),  [1,3, 5, 6])  No  splitting  required  here. 
((4, b), [1,3,5, 61) 
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EXAMPLE  5.28  ( Continued ) 

((l,a),  [2.4])  ((3,  a).  (2.4])  «5.a).|2.4|)  ((6,a).[l. 3.5.6)) 

((1.  b).  [2.4))  ((3,b).|2,4|)  ((5.  b).  |2. 4))  ((b.b).  |1.3.5.6]) 

There  are  two  different  patterns,  so  we  must  split  into  two  classes.  1 1. 3. 5)  and 
[6).  Note  that,  although  [6j  has  the  same  behavior  as  |2, 4)  after  reading  a  single 
character,  it  cannot  be  combined  with  (2. 4]  because  they  do  not  share  behavior 
after  reading  no  characters. 

Classes  =  { [2. 4),  (1.3, 5),  (6) ) . 

At  step  2: 

((2.a).  (1,3,5))  ((4. a),  (ft))  These  two  must  be  split. 

((2,  b),  (6))  ((4.  b),  (1.3,5)) 

((1,  a).  (2,4))  ((3.  a).  (2,4))  ((5.  a).  (2.4))  No  splitting  required. 

((I,b),l2.4)>  ((3,  b),  (2.4))  ((5.  b).  |2.4)} 

Classes  =  {(2),  (4),  (1,3, 5).  (6)). 

At  step  3: 

((l,a),(2J)  ((3.  a).  (2))  ((5. a). (2))  No  splitting  required. 

((l,b).(4j)  ((3,  b),  (4))  ((5,  b),  (4)) 

So  minDFSM  returns  M'  = 


5.8  A  Canonical  Form  for  Regular  Languages 

A  canonical  form  for  some  set  of  objects  C  assigns  exactly  one  representation  to  each 
class  of  “equivalent"  objects  in  C.  Further,  each  such  representation  is  distinct,  so  two 
objects  in  C  share  the  same  representation  iff  they  are  "equivalent"  in  the  sense  for 
which  we  define  the  form. 
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The  ordered  binary  decision  diagram  (OBDD)  is  a  canonical  form  for 
Boolean  expressions  that  makes  it  possible  for  model  checkers  to  verify  the 
correctness  of  very  large  concurrent  systems  and  hardware  circuits.  (B.1.3) 


Suppose  that  we  had  a  canonical  form  for  FSMs  with  the  property  that  two  FSMs 
share  a  canonical  form  iff  they  accept  the  same  language.  Further  suppose  that  we  had 
an  algorithm  that  on  input  Af,  constructed  M' s  canonical  form.  Then  some  questions 
about  FSMs  would  become  easy  to  answer.  For  example,  we  could  test  whether  two 
FSMs  are  equivalent  (i.e.,lhey  accept  the  same  language).  It  would  suffice  to  construct 
the  canonical  form  for  each  of  them  and  test  whether  the  two  forms  are  identical. 

The  algorithm  minDFSM  constructs,  from  any  DFSM  Af.  a  minimal  machine  that 
accepts  L(M),  By  Theorem  5.5,  all  minimal  machines  for  L(Af)  are  identical  except 
possibly  for  state  names.  So.  if  we  could  define  a  standard  way  to  name  states,  we  could 
define  a  canonical  machine  to  accept  L(M)  (and  thus  any  regular  language).  The  fol¬ 
lowing  algorithm  does  this  by  using  the  state-naming  convention  that  we  described  in 
the  proof  of  Theorem  5.5: 

build  FSMauumicalfomi(M:  FSM)  = 

1.  A/'  =  lulfsmtodfsni(M). 

2.  A/#  =  minDFSM(M'). 

3.  Create  a  unique  assignment  of  names  to  the  states  of  A/#  as  follows: 

3.1.  Call  the  start  state  <y„. 

3.2.  Define  an  order  on  the  elements  of  £. 

3.3  Until  all  slates  have  been  named  do: 

Select  the  lowest  numbered  named  stale  that  has  not  yet  been  selected.  Call  it  q. 

Create  an  ordered  list  of  the  transitions  out  of  q  by  the  order  imposed  on 

their  labels. 

Create  an  ordered  list  of  the  as  yet  unnamed  states  that  those  transitions 
enter  by  doing  the  following:  If  the  first  transition  is  ( q ,  Cj.pj),  then  put 
P i  f>rst-  If  the  second  transition  is  (</,  c2.p2)  and  Pi  is  not  already  on  the 
list,  put  it  next.  If  it  is  already  on  the  list,  skip  it.  Continue  until  all  tran¬ 
sitions  have  been  considered.  Remove  from  the  list  any  stales  that  have 
already  been  named. 

Name  the  states  on  the  list  that  was  just  created:  Assign  to  the  first  one  the 
name  </*,  where  k  is  the  smallest  index  that  hasn't  yet  been  used.  Assign 
the  next  name  to  the  next  state  and  so  forth  until  all  have  been  named. 
4.  Return  Af#. 

Given  two  FSMs  My  and  Af2,  build FSMcanonical form  (My )  =  buildFSMcanonical 
form  (Af.)  ilf  L(My)  =  L  (Af2).  We'll  see,  in  Section  9.1.4.  one  important  use  for  this 
canonical  form:  It  provides  the  basis  for  a  simple  way  to  test  whether  an  FSM  accepts 
any  strings  or  whether  two  FSMs  are  equivalent. 
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5.9  Finite  State  Transducers  * 

So  far,  we  have  used  finite  state  machines  as  language  recognizers.  All  we  have  cared 
about,  in  analyzing  a  machine  M .  is  whether  or  not  M  ends  in  an  accepting  state.  But  it 
is  a  simple  matter  to  augment  our  finite  state  model  to  allow  for  output  at  each  step  of 
a  machine's  operation.  Often,  once  we  do  that,  we  may  cease  to  care  about  whether  M 
actually  accepts  any  strings.  Many  finite  state  transducers  are  loops  that  simply  run  for¬ 
ever,  processing  inputs. 

One  simple  kind  of  finite  state  transducer  associates  an  output  with  each  state  of  a 
machine  M.  That  output  is  generated  whenever  M  enters  the  associated  slate.  Deter¬ 
ministic  finite  slate  liansducers  of  this  sort  are  called  Moore  machines,  after  their  inven¬ 
tor  Edward  Moore.  A  Moore  machine  M  is  a  seven-tuple  P,  s.  A),  where: 

•  K  is  a  finite  set  of  states, 

•  2  is  an  input  alphabet, 

•  O  is  an  output  alphabet, 

•  s  e  K  is  the  start  state, 

•  A  £  K  is  the  set  of  accepting  states  (although  for  some  applications  this  designation 
is  not  important), 

•  5  is  the  transition  function.  It  is  function  from  ( K  x  2)  to  ( K ).  and 

•  D  is  the  display  or  output  function.  It  is  a  function  from  (/C)  to  (O*). 

A  Moore  machine  M  computes  a  function  f(w)  iff,  when  it  reads  the  input  string  to,  its 
output  sequence  is  f[w). 


EXAMPLE  5.29  A  Typical  United  States  Traffic  Light 

Consider  the  following  controller  for  a  single  direction  of  a  very  simple  U.S.  traf¬ 
fic  light  (which  ignores  time  of  day,  traffic,  the  need  to  let  emergency  vehicles 
through,  etc.).  We  will  also  ignore  the  fact  that  a  practical  controller  has  to  manage 
all  directions  for  a  particular  intersection.  In  Exercise  5.16,  we  will  explore  remov¬ 
ing  some  of  these  limitations. 

The  states  in  this  simple  controller  correspond  to  the  light’s  colors:  green,  yel¬ 
low  and  red.  Note  that  the  definition  of  the  start  state  is  arbitrary.  There  are  three 
inputs,  all  of  which  are  elapsed  time. 


A  different  definition  for  a  deterministic  finite  state  transducer  permits  each  ma¬ 
chine  to  output  any  finite  sequence  of  symbols  as  it  makes  each  transition  (in  other 
words,  as  it  reads  each  symbol  of  its  input).  FSMs  that  associate  outputs  with  transitions 
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are  called  Mealy  machines,  after  their  inventor  George  Mealy.  A  Mealy  machine  M  is  a 
six-tuple  ( K ,  2. 0, 5.  s,  A),  where: 

•  K  is  a  finite  set  of  states, 

•  2  is  an  input  alphabet, 

•  O  is  an  output  alphabet, 

•  s  e  K  is  the  start  state, 

•  A  C  is  the  set  of  accepting  states,  and 

•  6  is  the  transition  function.  It  is  a  function  from  (/C  X  2)  to  ( K  x  O*). 

A  Mealy  machine  M  computes  a  function  f[w)  iff,  when  it  reads  the  input  string  w,  its 
output  sequence  is  f{w). 

EXAMPLE  5.30  Generating  Parity  Bits 

The  following  Mealy  machine  adds  an  odd  parity  bit  after  every  four  binary  digits  that 
it  reads.  We  will  use  the  notation  alb  on  an  arc  to  mean  that  the  transition  may  be  fol¬ 
lowed  if  the  input  character  is  a.  If  it  is  followed,  then  the  string  b  will  be  generated. 


Digital  circuits  can  be  modeled  as  transducers  using  either  Moore  or  Mealy 
machines.  (P.  3) 


EXAMPLE  5.31  A  Bar  Code  Reader 

Bar  codes  arc  ubiquitous.  We  consider  here  a  simplification:  a  bar  code  system 
that  encodes  just  binary  numbers.  Imagine  a  bar  code  such  as: 
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EXAMPLE  5.31  ( Continued ) 

ll  is  composed  of  columns,  each  of  the  same  width.  A  column  can  be  either  while 
or  black.  If  two  black  columns  occur  next  to  each  other,  it  will  look  to  us  like  a  sin¬ 
gle.  wide,  black  column,  but  the  reader  will  see  two  adjacent  black  columns  of  the 
standard  w  idth.  The  job  of  the  white  columns  is  to  delimit  the  black  ones.  A  single 
black  column  encodes  0.  A  double  black  column  encodes  1. 

We  can  build  a  finite  stale  transducer  to  read  such  a  bar  code  and  output  a  string 
of  binary  digits.  We'll  represent  a  black  bar  with  the  symbol  B  and  a  while  bar  with 
the  symbol  W.The  input  to  the  transducer  will  be  a  sequence  of  those  symbols,  cor¬ 
responding  to  reading  the  bar  code  left  to  right.  We’ll  assume  that  every  correct  bar 
code  starts  w  ilh  a  black  column,  so  white  space  ahead  of  the  first  black  column  is  ig¬ 
nored.  We’ll  also  assume  that  after  every  complete  bar  code  there  are  at  least  two 
white  columns.  So  the  reader  should,  at  that  point,  reset  to  be  ready  to  read  the  next 
code.  If  the  reader  sees  three  or  more  black  columns  in  a  row.  it  must  indicate  an 
error  and  slay  in  its  error  stale  until  it  is  reset  by  seeing  two  white  columns. 


Interpreters  for  finite  stale  transducers  can  be  built  using  techniques  similar  to  the 
ones  that  we  used  in  Section  5.6  to  build  interpreters  for  finite  state  machines. 

5.10  Bidirectional  Transducers  + 

A  process  that  reads  an  input  siring  and  constructs  a  corresponding  output  siring  can 
be  described  in  a  variety  of  different  w'avs.  Why  should  we  choose  the  finite  state  trans¬ 
ducer  model?  One  reason  is  that  it  provides  a  declarative,  rather  than  a  procedural, 
way  to  describe  the  relationship  between  inputs  and  outputs.  Such  a  declarative  model 
can  then  be  run  in  two  directions.  For  example: 

•  To  read  an  English  text  requires  transforming  a  word  like  “liberties''  into  the  root 
word  “liberty"  and  the  affix  PLURAL. To  generate  an  English  text  requires  trans¬ 
forming  a  root  word  like  “liberty"  and  the  semantic  marker  “IM.LRAL"  into  the 
surface  word  “liberties”.  If  wc  could  specify,  in  a  single  declarative  model,  the  rela¬ 
tionship  between  surface  words  (the  ones  we  see  in  text )  and  uinlerly  mg  root  words 
and  affixes,  we  could  use  it  for  either  application. 
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The  facts  about  English  spelling  rules  and  morphological  analysis  can  be 
described  with  a  bidirectional  finite  state  transducer.  (L.l) 


♦  The  Soundex  system,  described  below  in  Example  5.33,  groups  names  that  sound 
alike.  To  create  the  Soundex  representation  of  a  name  requires  a  set  of  rules  for 
mapping  the  spelling  of  the  name  to  a  unique  four  character  code.  To  find  other 
names  that  sound  like  the  one  that  generated  a  particular  code  requires  running 
those  same  rules  backwards. 

•  Muny  things  we  cull  translators  need  to  run  in  both  directions.  For  example,  consider 
translating  between  Roman  numerals  □  and  Arabic  ones. 

If  we  expand  the  definition  of  a  Mealy  machine  to  allow  nondelerminism,  then  any 
of  these  bidirectional  processes  can  be  represented.  A  nondeterministic  Mealy  ma¬ 
chine  can  be  thought  of  as  defining  a  relation  between  one  set  of  strings  (for  example, 
English  surface  words)  and  a  second  set  of  strings  (for  example,  English  underlying 
root  words,  along  with  affixes).  It  is  possible  that  we  will  need  a  machine  that  is  nonde¬ 
terministic  in  one  or  both  directions  because  the  relationship  between  the  two  sets  may 
not  be  able  to  be  described  as  a  function. 


EXAMPLE  5.32  Letter  Substitution 

When  we  define  a  regular  language,  it  doesn’t  matter  what  alphabet  we  use.  Any¬ 
thing  that  is  true  of  a  language  L  defined  over  the  alphabet  {a,  b}  will  also  be  true 
of  the  language  L'  that  contains  exactly  the  strings  in  L  except  that  every  a  has 
been  replaced  by  a  0  and  every  b  has  been  replaced  by  a  1.  We  can  build  a  simple 
bidirectional  transducer  that  can  convert  strings  in  L  to  strings  in  L'  and  vice  versa. 


Of  course,  the  real  power  of  bidirectional  finite  slate  transducers  comes  from  their 
ability  to  model  more  complex  processes. 

EXAMPLE  5.33  Soundex:  A  Way  to  Find  Similar  Sounding  Names 

People  change  the  spelling  of  their  names.  Sometimes  the  spelling  was  changed 
for  them  when  they  immigrated  to  a  country  with  a  different  language,  a  different 
set  of  sounds,  and  maybe  a  different  writing  system.  For  various  reasons,  one 
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EXAMPLE  5.33  ( Continued) 

might  want  to  identify  other  people  to  whom  one  is  related.  But  because  of 
spellingchanges.it  isn't  sufficient  simple  to  look  lor  people  with  exactly  the  same 
last  name. The  Soundex  ^  system  was  patented  by  Margaret  O’Dell  and  Robert 
G  Russell  in  191K  as  a  solution  to  this  problem. The  system  maps  any  name  to  a 
four  character  code  that  is  derived  from  the  original  name  but  that  throws  away 
details  of  the  sort  that  often  gel  perturbed  as  names  evolve.  So.  to  find  related 
names,  one  can  run  the  Soundex  transducer  in  one  direction,  from  a  starting  name 
to  its  Soundex  code  and  then,  in  the  other  direction,  from  the  cinJe  to  the  other 
names  that  share  that  code.  For  example,  if  we  start  w  ith  the  name  Kavlor,  we  will 
produce  the  Soundex  code  K460.  If  we  then  use  that  code  and  run  the  transducer 
backwards,  we  can  generate  the  names  Kahler,  Kaler.  Kay  lor,  Keeler.  Kellar. 
Kelleher.  Keller.  Kelliher.  Kilroe.  Kilroy.  Koehler.  Kohler.  Roller,  and  Kvler. 

The  Soundex  system  is  described  by  the  following  set  ol  rules  lor  mapping 
from  a  name  to  a  Soundex  code: 

1.  If  two  or  more  adjacent  letters  (including  the  first  in  the  name)  would  map 
to  the  same  number  if  rule  3.1  were  applied  to  them,  remove  all  but  the  first 
in  the  sequence. 

2.  The  first  character  of  the  Soundex  code  will  be  the  first  letter  of  the  name. 

3.  For  all  other  letters  of  the  name  do: 

3.1.  Convert  the  letters  B.  P.  F. V.  C.  S.  G.  J.  K.  Q.  X.  Z.  D.T.  L.  M.  N.  and  R  to 

numbers  using  the  following  correspondences: 

B. P.F.V  =  1. 

C. S.G.J,  K.Q.X.Z  =  2. 

D. T  *3. 

L  =  4. 

M,  N  =  5. 

R  =  6. 

3.2.  Delete  all  instances  of  the  letters  A.  E.  I.  O.  IJ.  Y.  H.  and  VV. 

4.  If  the  string  contains  more  than  three  numbers,  delete  all  hut  the  leftmost 
three. 

5.  If  the  string  contains  fewer  than  three  numbers,  pad  with  0's  on  the  right  to 
get  three. 

Here's  an  initial  fragment  of  a  finite-stale  transducer  that  implements  the  rela¬ 
tionship  between  names  and  Soundex  codes.  The  complete  version  of  this  ma¬ 
chine  can  input  a  name  and  output  a  code  by  interpreting  each  transition  labeled 
xly  as  saying  that  the  transition  can  be  taken  on  input  x  and  it  w  ill  output  y.  Going 
the  other  direction,  it  can  input  a  code  and  output  a  name  if  it  interprets  each 
transition  the  other  way  On  input  y.  take  the  transition  and  output  .v.  lo  simplify 
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the  diagram,  we've  used  two  conventions:  The  symbol  #  stands  for  any  one  of  the 
letters  A,E.l,O.U.Y.H,  or  W.  And  a  label  of  the  form  x ,  y,  zta  is  a  shorthand  for 
three  transitions  labeled  xlu,  yla .  and  zla.  Also,  the  stales  are  named  to  indicate 
how  many  code  symbols  have  been  generated/read. 


Notice  that  in  one  direction  (from  names  to  codes),  this  machine  operates  de¬ 
terministically.  But,  because  information  is  lost  in  that  direction,  if  we  run  the  ma¬ 
chine  in  the  direction  that  maps  from  codes  to  names,  it  becomes  nondeterministic. 

For  example,  the  e-transitions  can  be  traversed  any  number  of  times  to  generate 
vowels  that  are  not  represented  in  the  code.  Because  the  goal,  in  running  the  ma¬ 
chine  in  the  direction  from  code  to  names  is  to  generate  actual  names,  the  system 
that  does  this  is  augmented  with  a  list  of  names  found  in  U.S.  census  reports.  It  can 
then  follow  paths  that  match  those  names. 

The  Soundex  system  was  designed  for  the  specific  purpose  of  matching  names 
in  United  Slates  census  data  from  the  early  part  of  the  twentieth  century  and  be¬ 
fore.  Newer  systems,  such  as  Phonix  and  Metaphone  P.  are  attempts  to  solve  the 
more  general  problem  of  identifying  words  that  sound  similar  to  each  other.  Such 
systems  are  used  in  a  variety  of  applications,  including  ones  that  require  matching 
a  broader  range  of  proper  names  (e.g.,  genealogy  and  white  pages  look  up)  as  well 
as  more  general  word  matching  tasks  (e.g..  spell  checking). 

5.1 1  Stochastic  Finite  Automata:  Markov  Models  and  HMMs  • 

Most  of  the  finite  stale  transducers  that  we  have  considered  so  far  are  deterministic.  But 
that  is  simply  a  property  of  the  kinds  of  applications  to  which  they  are  put.  We  do  not  want 
to  live  in  a  world  of  nondeterministic  traffic  lights  or  phone  switching  circuits.  So  we  typi¬ 
cally  design  controllers  (i.c..  machines  that  run  things)  to  be  deterministic.  For  some  appli¬ 
cations  though,  nondeterminism  can  be  useful.  For  example,  it  can  add  entertainment  value. 


Nondeterministic  (possibly  stochastic)  FSMs  can  form  the  basis  of  video 
games.  (N.3.1) 
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But  now  consider  problems  like  the  name-evolution  one  wc  just  discussed.  Now  we 
are  not  attempting  to  build  a  controller  that  drives  the  world.  Instead  we  are  trying  to 
build  a  model  that  describes  and  predicts  a  world  that  we  are  not  in  control  of.  Nondeter- 
ministic  finite  state  models  are  often  very  useful  tools  in  solving  such  problems.  And  typ¬ 
ically,  although  we  do  not  know  enough  to  predict  with  certainty  how  the  behavior  of  the 
model  will  change  from  one  step  to  the  next  (thus  the  need  for  nondelerminism).we  do 
have  some  data  that  enable  us  to  estimate  the  probability  that  the  system  will  move  from 
one  state  to  the  next.  In  this  section,  we  explore  the  use  of  nondeterministic  finite  state 
machines  and  transducers  that  have  been  augmented  with  probabilistic  information. 

5.11.1  Markov  Models 

A  Markov  model  □  is  an  NDFSM  in  which  the  state  at  each  step  can  be  predicted  by  a 
probability  distribution  associated  with  the  current  state.  Steps  usually  correspond  to 
time  intervals,  but  they  may  correspond  to  any  ordered  discrete  sequence.  In  essence 
we  replace  transitions  labeled  with  input  symbols  by  transitions  labeled  with  probabil¬ 
ities.  The  usual  definition  of  a  Markov  model  is  that  its  behavior  at  time  f  depends  only 
on  its  state  at  time  t  -  1  (although  higher-order  models  may  allow  any  finite  number 
of  past  states  to  play  a  role).  Of  course,  if  we  eliminate  an  input  sequence,  that  is  exactly 
the  property  that  characterizes  an  FSM. 


Markov  models  have  been  used  in  music  composition.  (N.  1.1)  They  have  also 
been  used  to  model  the  generation  of  many  other  sorts  of  content,  including 
Web  pages  Q. 


Formally  a  Markov  model  is  a  triple  M  ( K ,  ir.  A ),  where: 

•  K  is  a  finite  set  of  states. 

•  tt  is  a  vector  that  contains  the  initial  probabilities  of  each  of  the  slates,  and 

•  A  is  a  matrix  that  represents  the  transition  probabilities.  A  [p.  </]  =  Pr(.v/«re  </  at  time  t 

|  state  p  at  time  r  —  1 ).  In  other  words  A[p.<i]  is  the  probability  that. if  M  is  in  slate 
pM  will  go  to  state  q  next. 

Some  definitions  specify  a  unique  start  state,  but  this  definition  is  more  general.  If 
there  is  a  unique  start  state,  then  its  initial  probability  is  1  and  the  initial  probabilities 
of  all  other  states  are  0. 

Notice  that  we  have  not  mentioned  any  output  alphabet.  We  will  assume  that  the 
output  at  each  step  is  simply  the  name  of  the  stale  of  the  machine  at  that  step.  The  se¬ 
quence  of  outputs  produced  by  a  Markov  model  is  often  called  a  Markov  chain. 


The  link  structure  of  the  World  Wide  Web  can  be  modeled  as  a  Markov 
chain,  where  the  states  correspond  to  Web  pages  and  the  probabilities  de¬ 
scribe  the  likelihood,  in  a  random  walk,  of  going  from  one  page  to  the  next. 
Google's  PageRank  is  based  on  the  limits  of  those  probabilities  □. 
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Given  a  Markov  model  that  describes  some  random  process,  we  can  answer  either 
of  the  following  questions: 

•  What  is  the  probability  that  we  will  observe  a  particular  sequence  SjS2--  •*„  of 
states?  We  can  compute  this  as  follows,  using  the  probability  that  s(  is  the  start  state 
and  then  multiplying  by  the  probabilities  of  each  of  the  transitions: 

n 

Pr(s,*2...sn)  =  7r[j,]- 

•  If  the  process  runs  for  an  arbitrarily  long  sequence  of  steps,  what  is  likely  to  be  the 
result?  More  specifically,  for  each  state  in  the  system,  what  is  the  probability  that 
the  system  will  land  in  that  state? 


EXAMPLE  5.34  A  Simple  Markov  Model  of  the  Weather 

Suppose  that  we  have  the  following  model  for  the  weather  where  we  live. This  model 
assumes  that  the  weather  on  day  t  is  influenced  only  by  the  weather  on  day  t  —  1. 


We  are  considering  a  five  day  camping  trip  and  want  to  know  the  probability  of 
five  sunny  days  in  a  row.  So  we  want  to  know  the  probability  of  the  sequence 
Sunny  Sunny  Sunny  Sunny  Sunny.  The  model  tells  us  that  it  is: 

.4  •  (.75)4  =  .1266 

Or  we  could  ask,  given  that  it’s  sunny  today,  what  is  the  probability  that,  if  we 
leave  now,  it  will  stay  sunny  for  four  more  days.  Now  we  assume  that  the  model 
starts  in  state  Sunny,  so  we  compute: 

(•75)4  =  .316 


EXAMPLE  5.35  A  Simple  Markov  Model  of  System  Performance 

Markov  models  are  used  extensively  to  model  the  performance  of  complex  sys¬ 
tems  of  all  kinds,  including  computers,  electrical  grids,  and  manufacturing  plants 
While  real  models  are  substantially  more  complex,  we  can  see  how  these  models 
work  by  taking  Example  5.34  and  renaming  the  states: 
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EXAMPLE  5.35  ( Continued ) 


To  make  it  a  bit  more  realistic,  we've  changed  the  probabilities  so  that  they  de¬ 
scribe  a  system  that  actually  works  most  of  the  time.  We  ll  also  use  smaller  time 
intervals,  say  seconds.  Now  we  might  ask,  “Given  that  the  system  is  now  up,  what 
is  the  probability  that  the  system  will  stay  up  for  an  hour  (i.c..for  3WK)  time  steps). 
The  (possibly  surprising)  answer  is: 

95360°  _  6  3823-  l(r*“ 


EXAMPLE  5.36  Population  Genetics 

In  this  example  we  consider  a  simple  problem  in  population  genetics.  For  a  survey 
of  the  biological  concepts  behind  this  example,  see  Appendix  K.  Suppose  that  we 
are  interested  in  the  effect  of  inbreeding  on  the  gene  pool  of  a  diploid  organism  (an 
organism,  such  as  humans,  in  which  each  individual  has  two  copies  of  each  gene). 
Consider  the  following  simple  model  of  the  inheritance  of  a  single  gene  with  two  al¬ 
leles  (values):  A  and  B. There  are  potentially  three  kinds  of  individuals  in  the  popu¬ 
lation:  the  AA  organisms,  the  BB  organisms,  and  the  AB  organisms.  Because  we  are 
studying  inbreeding,  we’ll  make  the  assumption  that  individuals  always  mate  with 
others  who  are  genetically  similar  to  themselves  and  so  possess  the  same  gene  pair. 

To  simplify  our  model,  we  will  assume  that  one  couple  mates,  has  two  children, 
and  dies.  So  we  can  think  of  each  individual  as  replacing  itself  and  then  dying.  We 
can  build  the  following  Markov  model  of  a  chain  of  dcscendcnts.  Each  step  now 
corresponds  to  a  generation. 
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AA  pairs  can  produce  only  AA  offspring.  BB  pairs  can  produce  only  BB  off¬ 
spring.  But  what  about  AB  pairs?  What  is  their  fate?  We  can  answer  this  question 
by  considering  the  probability  that  the  model,  if  it  starts  in  state  AB  and  runs  for 
some  number  of  generations,  will  land  in  state  AB.That  probability  is  .5",  where  n 
is  the  number  of  generations.  As  n  grows,  that  number  approaches  0.  We  show 
how  quickly  it  does  so  in  the  following  table: 


n 

Pr(AB) 

1 

5 

5 

.03125 

10 

.0009765625 

100 

7.8886  - 10-31 

After  only  10  generations,  very  few  heterozygous  individuals  (i.e.,  possessing  two 
different  alleles)  remain.  After  100  generations,  almost  none  do.  If  there  is  survival  ad¬ 
vantage  in  being  heterozygous,  this  could  be  a  disaster  for  the  population.  The  disas¬ 
ter  can  be  avoided,  of  course,  if  individuals  mate  with  genetically  different  individuals. 


Where  do  the  probabilities  in  a  Markov  model  come  from?  In  some  simple  cases, 
they  may  be  computed  by  hand  and  added  to  the  system.  In  most  cases,  however,  they 
are  computed  by  examining  real  datasets  and  discovering  the  probabilities  that  best  de¬ 
scribe  those  data.  So,  for  example,  the  probabilities  we  need  for  the  system  performance 
model  of  Example  5.35  could  be  extracted  from  a  log  of  system  behavior  over  some  re¬ 
cent  period  of  time. To  see  how  this  can  be  done,  suppose  that  we  have  observed  the  out¬ 
put  sequences:T  PTQ  P  QT  and  SSPTPQQPSTQ  PTT  P.The  correct  value  for 
A[P,  Q]  is  the  number  of  times  the  pair  P  Q  appears  in  the  sequence  divided  by  the  total 
number  of  times  that  P  appears  in  any  position  except  the  last.  Similarly,  the  correct 
value  for  tt[P]  is  the  total  number  of  times  that  P  is  the  first  symbol  in  a  sequence  divided 
by  the  total  number  of  sequences.  In  realistic  problem  contexts,  the  models  are  huge  and 
they  evolve  over  time.There  exist  more  computationally  tractable  algorithms  for  updat¬ 
ing  the  probabilities  (and,  when  necessary  the  states)  of  such  models. 


Substantial  work  has  been  done  on  efficient  techniques  for  updating  the  huge 
Markov  model  of  the  World  Wide  Web  that  is  used  to  compute  Google’s 
PageRanks  R.  Note  here  that  both  the  state  set  (corresponding  to  the  set  of 
pages  on  the  Web)  as  well  as  the  probabilities  (which  depend  on  the  link 
structure  of  the  Web)  must  be  regularly  revised. 


All  of  the  Markov  models  we  have  presented  so  far  have  the  property  that  their  be¬ 
havior  at  step  t  is  a  function  only  of  their  state  at  step  t  —  1.  Such  models  are  called  first- 
order.  To  build  a  first-order  model  with  k  states  requires  that  we  specify  k 2  transition 
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probabilities  Now  suppose  that  we  wish  to  describe  a  situation  in  which  what  happens 
next  depends  on  the  previous  two  states.  Or  the  previous  three.  Using  the  same  tech¬ 
niques  that  we  used  to  build  a  first-order  model,  we  can  build  models  that  consider  the 
previous  n  states  for  any  fixed  n.  Such  models  are  called  u,h  order  Markov  models.  Notice 
that  an  /7,h  order  model  requires  k"*1  transition  probabilities.  But  if  there  are  enough 
data  available  to  train  a  higher-order  model  (i.e..  to  assign  appropriate  probabilities  to 
all  of  the  required  transitions),  it  may  be  possible  to  build  a  system  that  quite  accurately 
mimics  the  behavior  of  a  very  complex  system. 


A  third-order  Markov  model,  trained  on  about  half  of  this  book,  used  word 
frequencies  to  generate  the  text  “The  Pumping  Theorem  is  a  useful  way  to 
define  a  precedence  hierarchy  for  the  operators  +  and  (L.3.2)  A  clever 
application  of  a  higher  order  Markov  model  of  English  is  in  producing  spam 
that  is  hard  to  detect.  (L.3.2) 


Early  work  on  the  use  of  Markov  models  for  musical  composition  suggested 
that  models  of  order  four  or  less  tended  to  create  works  that  seemed  ran¬ 
dom,  while  models  of  order  seven  or  more  tended  to  create  works  that  felt 
just  like  copies  of  works  on  which  the  model  was  trained.  (N.1.1 ) 


Whenever  we  build  a  Markov  model  to  describe  a  naturally  occurring  process,  there 
is  a  sense  in  which  we  arc  using  probabilities  to  hide  an  underlying  lack  of  understand¬ 
ing  that  would  enable  us  to  build  a  deterministic  model  of  the  phenomenon.  So,  for  ex¬ 
ample,  if  we  know  that  our  computer  system  is  more  likely  to  crash  in  the  morning  than 
in  the  evening,  that  may  show  up  as  a  pair  of  different  probabilities  in  a  Markov  model, 
even  if  we  have  no  clue  why  the  time  of  day  affects  system  performance.  Some  Markov 
models  that  do  a  pretty  good  job  of  mimicking  nature  may  seem  silly  to  us  for  exactly 
that  reason. The  one  that  generates  random  English  text  is  a  good  example  of  that.  But 
now  suppose  that  we  had  a  model  that  did  a  very  good  job  of  predicting  earthquakes. 
Although  we  might  rather  have  a  good  structural  model  that  tells  us  why  earthquakes 
happen,  a  purely  statistical,  predictive  model  would  be  a  very  useful  tool.  It  is  because 
of  cases  like  this  that  Markov  models  can  be  extremely  valuable  tools  for  anyone 
studying  complex  systems  (be  they  naturally  occurring  ones  like  plate  tectonics  or  en¬ 
gineering  artifacts  like  computer  systems). 

5.11.2  Hidden  Markov  Models 

Now  suppose  that  we  are  interested  in  analyzing  a  system  that  can  be  described  with 
a  Markov  model  with  one  important  difference:  The  states  of  the  system  are  not  di¬ 
rectly  observable.  Instead  the  model  has  a  separate  set  of  output  symbols,  which  are 
emitted,  with  specified  probabilities,  whenever  the  system  enters  one  of  its  now  “hid¬ 
den”  states.  Now  we  must  base  our  analysis  of  the  system  on  an  observed  sequence  of 
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output  symbols,  from  which  we  can  infer,  with  some  probability,  the  actual  sequence 
of  slates  of  the  underlying  model. 

Examples  of  significant  problems  that  can  be  described  in  this  way  include: 

•  DNA  and  protein  evolution :  A  protein  is  a  sequence  of  amino  acids  that  is  man¬ 
ufactured  in  living  organisms  according  to  a  DNA  blueprint.  Mutations  that  change 
the  blueprint  can  occur,  with  the  result  that  one  amino  acid  may  be  substituted  for 
another,  one  or  more  amino  acids  may  be  deleted,  or  one  or  more  additional  amino 
acids  may  be  inserted.  When  we  examine  a  DNA  fragment  or  a  protein,  we’d  like  to 
be  able  to  reconstruct  the  evolutionary  process  so  that  we  can  find  other  proteins 
that  are  functionally  related  to  the  current  one.  even  though  its  details  may  be  dif¬ 
ferent.  But  the  process  isn’t  visible;  only  its  result  is. 


HMMs  are  used  for  DNA  and  protein  sequence  alignment  in  the  face  of  mu¬ 
tations  and  other  kinds  of  evolutionary  change.  (K.3.3) 


•  Speech  understanding:  When  we  talk,  our  mouths  map  from  the  sentences  we  want 
to  say  into  sequences  of  sounds.  The  mapping  is  complex  and  nondeterministic 
since  multiple  words  may  map  to  the  same  sound,  words  are  pronounced  differently 
as  a  function  of  the  words  before  and  after  them,  we  all  form  sounds  slightly  differ¬ 
ently,  and  so  forth.  All  a  listener  can  hear  is  the  sequence  of  sounds.  (S)he  would 
like  to  reconstruct  the  mapping  (backwards)  in  order  to  determine  what  words  we 
were  attempting  to  say. 


HMMs  are  used  extensively  in  speech  understanding  systems.  (L.5) 


•  Optical  character  recognition  (OCR)  H:  When  we  write,  our  hands  map  from  an 
idealized  symbol  to  some  set  of  marks  on  a  page.  The  marks  are  observable,  but  the 
process  that  generates  them  isn’t.  Imagine  that  we  could  describe  a  probabilistic 
process  corresponding  to  each  symbol  that  we  can  write. Then,  to  interpret  the  marks, 
we  must  select  the  process  that  is  most  likely  to  have  generated  the  marks  we  can  see. 

What  is  a  Hidden  Markov  Model? 

A  powerful  technique  for  solving  problems  such  as  this  is  the  hidden  Markov  model  or 

HMM  0.  An  HMM  is  a  nondeterministic  finite  state  transducer  that  has  been  aug¬ 
mented  with  three  kinds  of  probabilistic  information: 

•  Each  state  is  labeled  with  the  probability  that  the  machine  will  be  in  that  state  when 
it  starts. 

•  Each  transition  from  some  state  p  to  some  (possibly  identical)  state  q  is  labeled  with 
the  probability  that,  whenever  the  machine  is  in  state  p,  it  will  go  next  to  state  q.  We 
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can  specify  M's  transition  behavior  completely  by  defining  these  probabilities.  If  it  is 
not  possible  for  M  to  go  from  some  slate  p  to  some  other  state  </,  then  we  simply 
state  the  probability  of  going  from  p  to  q  as  0. 

•  Each  output  symbol  c  at  each  state  q  is  labeled  with  the  probability  that  the  ma¬ 
chine,  if  it  is  in  slate  q .  will  output  c. 

Formally,  an  HMM  M  is  a  quintuple  (K.O.  n .  A.  B),  where: 

•  K  is  a  Finite  set  of  states. 

•  O  is  the  output  alphabet. 

•  ir  is  a  vector  that  contains  the  initial  probabilities  of  each  of  the  stales. 

•  A  is  a  matrix  that  represents  the  transition  probabilities.  A[p.  q]  -  Pr(state 
q  at  time  t  \  state  p  at  lime  /  —  1 ). 

•  B ,  sometimes  called  the  confusion  matrix,  represents  the  output  probabilities. 
B\q ,  o]  =  Pr( output  o  \  state  q).  Note  that  outputs  are  associated  with  states  (as in 
Moore  machines). 

The  name  “hidden  Markov  model"  derives  from  the  two  key  properties  of  such  de¬ 
vices: 

•  They  are  Markov  models.  Their  state  at  time  t  is  a  function  solely  of  their  state  at 
time  t  -  1. 

•  The  actual  progression  of  the  machine  from  one  state  to  the  next  is  hidden  from  all 
observers.  Only  the  machine's  output  string  can  be  observed. 

To  use  an  HMM  as  the  basis  for  an  application  program,  we  typically  have  to  solve 

some  or  all  of  the  following  problems: 

•  The  decoding  problem:  Given  an  observation  sequence  O  and  an  HMM  M,  dis¬ 
cover  the  path  through  M  that  is  most  likely  to  have  produced  O.  For  example,  O 
might  be  a  string  of  words  that  form  a  sentence.  We  might  have  an  HMM  that  de¬ 
scribes  the  structure  of  naturally  occurring  English  sentences.  Each  state  in  M  cor¬ 
responds  to  a  part  of  speech,  such  as  noun,  verb,  or  adjective.  It's  not  possible  to  tell, 
just  by  looking  at  O.  what  sequence  of  parts  of  speech  generated  it,  since  many 
words  can  have  more  than  one  part  of  speech.  (Consider,  for  example,  the  simple 
English  sentence,  “Hit  the  fly  ball")  But  we  need  to  infer  the  parts  of  speech  (a  process 
called  part  of  speech  or  POS  tagging)  before  we  can  parse  the  sentence.  We  can  do 
that  if  we  can  find  the  path  through  the  HMM  that  is  the  most  likely  to  have  gener¬ 
ated  the  observed  sentence. This  problem  can  be  solved  efficiently  using  a  dynamic 
programming  algorithm  called  the  Viterbi  algorithm,  described  below. 


HMMs  are  often  used  for  part  of  speech  lagging.  (L.2) 
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Suppose  that  the  sequences  that  we  observe  correspond  to  original  sequences 
that  have  been  altered  in  some  way.  The  alteration  may  have  been  done  intention¬ 
ally  (we'll  call  this  “obfuscation”)  or  it  may  be  the  result  of  a  natural  phenomenon 
like  evolution  or  a  noisy  transmission  channel.  In  either  case,  if  we  want  to  know 
what  the  original  sequence  was,  we  have  an  instance  of  the  decoding  problem.  We 
seek  to  find  the  original  sequence  that  is  most  likely  to  have  been  the  one  that  got 
transformed  into  the  observed  sequence. 


In  the  Internet  era,  an  important  application  of  obfuscation  is  the  generation 
of  spam.  If  specific  words  are  known  to  trigger  spam  filters,  they  can  be  al¬ 
tered,  by  changing  vowels,  introducing  special  characters,  or  whatever,  so 
that  they  are  still  recognizable  to  people  but  unrecognizable,  at  least  until  the 
next  patch,  to  the  spam  filters.  HMMs  can  be  used  to  perform  “deobfusca¬ 
tion”  in  an  attempt  to  foil  the  obfuscators.  H. 


•  The  evaluation  problem:  Given  an  observation  sequence  O  and  a  set  of  HMMs 
that  descrihe  a  collection  of  possible  underlying  models,  choose  the  HMM  that  is 
most  likely  to  have  generated  O.  For  example,  O  might  be  a  sequence  of  sounds.  We 
might  have  one  HMM  for  each  of  the  words  that  we  know.  We  need  to  choose  the 
word  model  that  is  most  likely  to  have  generated  O.  As  another  example,  consider 
again  the  protein  problem:  Now  we  have  one  HMM  for  each  family  of  related  pro¬ 
teins.  Given  a  new  sample,  we  want  to  find  the  family  to  which  it  is  most  likely  to  be 
related.  So  we  look  for  the  HMM  that  is  most  likely  to  have  generated  it. This  prob¬ 
lem  can  be  solved  efficiently  using  the  forward  algorithm,  which  is  very  similar  to 
the  Viterbi  algorithm  except  that  it  considers  all  paths  through  a  candidate  HMM, 
rather  than  just  the  most  likely  one. 

•  The  training  problem:  We  typically  assume,  in  crafting  an  HMM  M,  that  the  set  K  of 
states  is  built  by  hand.  But  where  do  all  the  probabilities  in  tt.  A,  and  B  come  from? 
Fortunately,  there  are  algorithms  that  can  learn  them  from  a  set  of  training  data  (i.e., 
a  set  of  observed  output  sequences  O).  One  of  the  most  commonly  used  algorithms  is 
the  Baum- Welch  algorithm  S,  also  called  the  forward-backward  algorithm.  Its  goal  is 
to  tune  7T,  A ,  and  B  so  that  the  resulting  HMM  M  has  the  property  that,  out  of  all  the 
HMMs  whose  state  set  is  equal  to  K ,  M  is  the  one  most  likely  to  have  produced  the 
outputs  that  constitute  the  training  set.  Because  the  states  cannot  be  directly  observed 
(as  they  can  be  in  a  standard  Markov  model),  the  training  technique  that  we  de¬ 
scribed  in  Section  5.11.1  won’t  work  here.  Instead,  the  Baum- Welch  algorithm  em¬ 
ploys  a  technique  called  expectation  maximization  or  EM.  It  is  an  iterative  method, 
so  it  begins  with  some  initial  set  of  values  for  7r,  A,  and  B.Then  it  runs  the  forward  al¬ 
gorithm,  along  with  a  related  backward  algorithm,  on  the  training  data.  The  result  of 
this  step  is  a  set  of  probabilities  that  describe  the  likelihood  that  the  existing  machine, 
with  the  current  values  of  7r ,  A ,  and  B ,  would  have  output  the  training  set.  Using  those 
probabilities,  Baum-Welch  updates  tt,  A ,  and  B  to  increase  those  probabilities  The 
process  continues  until  no  changes  to  the  parameter  values  can  be  made. 
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The  Viterbi  Algorithm 

Given  an  HMM  M  and  an  observed  output  sequence  O.  a  solution  to  the  decoding 
problem  is  the  path  through  M  that  is  most  likely  to  have  produced  O.  One  way  to  find 
that  most  likely  path  is  to  explore  all  paths  of  length  |0|.  keeping  track  of  the  accumu¬ 
lated  probabilities,  and  then  report  the  path  whose  probability  is  the  highest.  This  ap¬ 
proach  is  straightforward,  but  may  require  searching  a  tree  with  IK^I1"1  nodes,  so  the 
time  required  may  grow  exponentially  in  the  length  of  O. 

A  more  efficient  approach  uses  a  dynamic  programming  technique  in  which  the 
most  likely  path  of  some  length,  say  i,  is  computed  once  and  then  extended  by  one 
more  step  to  find  the  most  likely  path  of  length  I  +  1.  The  Viterbi  algorithm  uses  this 
approach.  It  solves  the  decoding  problem  by  computing,  for  each  step  /  and  for  each 
state  q  in  M: 

•  The  most  likely  path  to  q  of  all  the  ones  that  would  have  generated  0( . . .  O, . 

•  The  probability  of  that  path. 

Once  it  has  done  that  for  each  step  for  which  an  output  was  observed,  it  traces  the 
path  backwards.  It  assumes  that  the  last  state  is  the  one  at  the  end  of  the  overall  most 
likely  path. The  next  to  the  last  stale  is  the  one  that  preceded  that  one  on  the  most  likely 
path,  and  so  forth. 

Assume,  at  each  step  /,  that  the  algorithm  has  already  considered  all  paths  of  length 
t  -  I  that  could  have  generated  0| ...0,-,.  From  those  paths,  it  has  selected,  for  each 
state  p,  the  most  likely  path  to  p  and  it  has  recorded  the  probability  of  the  model  tak¬ 
ing  that  path,  reaching  p ,  and  producing  O, . . .  O,. ,.  We  assume  further  that  the  algo¬ 
rithm  has  also  recorded,  at  each  stale  p .  the  slate  that  preceded  p  on  that  most  likely 
path.  Before  the  first  output  symbol  is  observed,  the  probability  that  the  system  has 
reached  some  state  p  is  simply  7r (p)  and  there  is  no  preceding  slate. 

Because  the  model  is  Markovian,  the  only  thing  that  affects  the  probability  of  the  next 
state  is  the  previous  state.  In  constructing  the  model,  we  assumed  that  prior  history 
doesn’t  matter  (although  that  may  be  only  an  approximation  to  reality  for  some  prob¬ 
lems).  So.  at  step  r,  we  compute,  for  each  state  q ,  the  probability  that  the  best  path  so  far 
that  is  consistent  with  0\...0,  ends  in  q  and  outputs  the  first  /  observed  symbols.  We  do 
this  by  considering  each  state  p  that  the  model  could  have  been  in  at  step  t  -  1.  We 
already  know  the  probability  that  the  best  path  up  to  slop  I  -  I  landed  in  p  and  produced 
the  observed  output  sequence.  So.  to  add  one  more  step,  we  multiply  that  probability  by 
A[p,q],  the  probability  that  the  model,  if  it  were  in  p%  would  go  next  to  q.  But  we  have  one 
more  piece  of  information:  the  next  output  symbol.  So.  to  compute  the  probability  that  the 
model  went  through  p.  landed  in  q,  and  output  the  next  symbol  n.  we  multiply  by  li\p .  o). 
Once  these  numbers  have  been  computed  for  all  possible  preceding  states  p.  we  choose 
the  most  likely  one  (i.e.,the  one  with  the  highest  score  as  described  above).  We  record  that 
score  at  q  and  we  record  at  q  that  the  most  likely  predecessor  state  is  the  one  that  pro¬ 
duced  that  highest  score. 

Although  we've  described  the  output  function  as  a  function  of  the  slate  the  model  is 
in,  we  don't  actually  consider  it  until  we  compute  the  next  step,  so  it  may  be  easier  to 
think  of  the  outputs  as  associated  with  the  transitions  rather  than  with  the  states.  In 
particular,  the  compulation  that  we  have  just  described  will  end  by  choosing  the  state 


5.11  Stochastic  Finite  Automata:  Markov  Models  and  HMMs  111 


in  which  the  model  is  most  likely  to  land  just  after  it  outputs  the  final  observed  symbol. 
That  last  state  will  not  generate  any  output. 

Once  all  steps  have  been  considered,  we  can  choose  the  overall  most  likely  path  as 
follows:  Consider  all  states.  The  model  is  most  likely  to  have  ended  in  the  one  that,  at 
the  final  time  step,  has  the  highest  score  as  described  above.  Call  that  highest  scoring 
state  the  last  state  in  the  path.  Find  the  state  that  was  marked  as  immediately  preced¬ 
ing  that  one.  Continue  backwards  to  the  start  state. 

We  can  summarize  this  process,  known  as  the  Viterbi  algorithm  H,  as  follows:  Given 
an  observed  output  sequence  O,  we  will  consider  each  time  step  between  1  and  the 
length  of  O.  At  each  such  step  r,  we  will  set  score(q ,  r)  to  the  highest  probability  associ¬ 
ated  with  any  path  of  length  t  that  lands  M  in  q,  having  output  the  first  t  symbols  in  O. 
We  will  set  hackptr{q,  t)  to  the  state  that  immediately  preceded  q  along  that  best  path. 
Once  score  and  hackptr  have  been  computed  for  each  state  at  each  time  step  t,  we  can 
start  at  the  most  likely  final  state  and  trace  backwards  to  find  the  sequence  states  that 
describes  the  most  likely  path  through  M  consistent  with  O.  So  the  Viterbi  algorithm  is: 

Viterhi(M:  Markov  model.  O',  output  sequence)  = 

1.  For  /  =  0,  for  each  state  q,  set  score[q ,  /]  to  n[q]. 

2.  I*  Trace  forward  recording  the  best  path  at  each  step: 

For  t  =  1  to  |0|  do: 

2.1.  For  each  state  q  in  K  do: 

2.1.1.  For  each  state  p  in  K  that  could  have  immediately  preceded  qt 
candidatescore[p ]  =  score[p,t  -  1]  *  A[p,q)  *  B[p,Ot]. 

212  /*  Record  score  along  most  likely  path: 

score[q,t ]  =  max  candidatescore[p\. 

pe.K 

2.13.  /*  Set  q's  backptr.  The  function  argmax  returns  the  value  of  the  argu¬ 
ment  p  that  produced  the  maximum  value  of  candidatescore\p] : 

backptr[q ,  f]  =  argmax  candidatescore[p]. 

peK 

I*  Retrieve  the  best  path  by  going  backwards  from  the  most  likely  last  state: 

3.  sra/es[|0|]  =  the  state  q  with  the  highest  value  of  score[q,  |0|J. 

4.  For  t  =  jO|  -  1  to  0  do: 

4.1.  .T/fltes[r]  =  backptr[states\t  +  11,/  +  1], 

5.  Return  s/a/es[0:  |0|  —  1],  /*  Ignore  the  last  state  since  its  output 

was  not  observed. 


The  Forward  Algorithm 

Now  suppose  that  we  want  to  solve  the  evaluation  problem:  Given  a  set  of  HMMs 
and  an  observed  output  sequence  O,  decide  which  HMM  had  the  highest  probabil¬ 
ity  of  producing  O.  This  problem  can  be  solved  with  the  forward  algorithm  B, 
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which  is  very  similar  to  the  Viterbi  algorithm  except  that,  instead  of  finding  the  sin¬ 
gle  best  path  through  an  HMM  M.  it  computes  the  probability  that  M  could  have 
output  O  along  any  path.  In  step  2.1.2,  the  Viterbi  algorithm  selects  the  highest 
score  associated  with  any  one  path  to  <?.The  forward  algorithm,  at  that  point,  sums 
all  the  scores.  The  other  big  difference  between  the  Viterbi  algorithm  and  the  for¬ 
ward  algorithm  is  that  the  forward  algorithm  does  not  need  to  find  a  particular 
path.  So  it  will  not  have  to  bother  maintaining  the  backpir  array.  Wc  can  state  the 
algorithm  as  follows: 

forward(M:  Markov  model,  O.  output  sequence)  = 

1.  For  t  =  0,  for  each  state  q,  set  forward-score[q ,  t]  to  7r(</]. 

2.  /*  Trace  forward  recording,  at  each  step,  the  total  probability  associated  with  all 

paths  to  each  state: 

For  /  =  1  to  \0\  do: 

2.1.  For  each  state  q  in  K  do: 

2.1.1.  Consider  each  state  p  in  K  that  could  have  immediately  preceded  q  i 

candidatescore[p]  =  forwardscore[p ,  f  -  1]  *  A[p,  q\  *  B[p,  0,]. 
2.1  JL  /*  Sum  scores  over  all  paths: 

forwardscore[q,  t\  =  ^ candidatescore[p\ . 

p 

3.  /*  Find  the  total  probability  of  going  through  M  along  any  path,  landing  in  any  of 

M's  states,  and  emitting  O.This  is  simply  the  sum  of  the  probability  of  landing  in 
state  1  having  emitted  O,  plus  the  probability  of  landing  in  state  2  having  emitted 
O ,  and  so  forth.  So: 

totalprob  =  fonvardscore[q,  |0|). 

qe.K 

4.  Return  totalprob. 

To  solve  the  evaluation  problem,  we  run  the  forward  algorithm  on  all  of  the  con¬ 
tending  HMMs  and  return  the  one  with  the  highest  final  score. 

The  Complexity  of  the  Viterbi  and  the  Forward  Algorithms 

Analyzing  the  complexity  of  the  Viterbi  and  the  forward  algorithms  is  straightforward. 
In  both  cases,  the  outer  loop  of  step  2  is  executed  once  for  each  observed  output,  so  |0| 
times.  Within  that  loop,  the  computation  of  candidaiescore  is  done  once  for  each  state 
pair.  So  if  M  has  k  states,  it  is  done  k2  times.  The  computation  of  scorelforwardscore 
takes  O(k)  steps,  as  does  the  computation  of  backptr  in  the  Viterbi  algorithm. The  final 
operation  of  the  Viterbi  algorithm  (computing  the  list  of  states  to  be  returned)  takes 
C>(jO| )  steps. The  final  operation  of  the  forward  algorithm  (computing  the  total  prob¬ 
ability  of  producing  the  observed  output)  takes  0(k )  steps.  So,  in  both  cases,  the  total 
time  complexity  is  0(*2<|0| ). 
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An  Example  of  How  These  Algorithms  Work 

The  real  power  of  HMMs  is  in  solving  complex,  real-world  problems  in  which  probabil¬ 
ity  estimates  can  be  derived  from  large  datasets.  So  it  is  hard  to  illustrate  the  effective¬ 
ness  of  HMMs  on  small  problems,  but  the  idea  should  be  clear  from  the  following 
simple  example  of  the  use  of  the  Viterbi  algorithm. 


EXAMPLE  5.37  Using  the  Viterbi  Algorithm  to  Guess  the  Weather 

Suppose  that  you  are  a  state  department  official  in  a  small  country.  Each  day,  you 
receive  a  report  from  each  of  your  consular  offices  telling  you  whether  or  not  any 
of  your  passports  were  reported  missing  that  day.  You  know  that  the  probability 
of  a  passport  getting  lost  or  stolen  is  a  function  of  the  weather,  since  people  tend 
to  stay  inside  (and  thus  manage  to  keep  track  of  their  passports)  when  the  weath¬ 
er  is  bad.  But  they  tend  to  go  out  and  thus  risk  getting  their  passport  lost  or  stolen 
if  the  weather  is  good.  So  it  amuses  you  to  try  to  infer  the  weather  in  your  favorite 
cities  by  watching  the  lost  passport  reports.  We’ll  use  the  symbol  L  to  mean  that  a 
passport  was  lost  and  the  symbol  #  to  mean  that  none  was.  So,  for  example,  a  re¬ 
port  for  a  week  might  look  like  LL##L###. 

We’ll  consider  just  two  cities,  London  and  Athens.  We  can  build  an  HMM  for 
each.  Both  HMMs  have  two  states,  Sunny  and  Rainy. 


London 


Athens 


ir  =.55 

Sunny 

B( Sunny.  L)  =  .7 
B(Sunny,  #)  =  .3 

;> 

7T=  .87 

Sunny 

fi(Sunny,  L)  =  .2 
fl(Sunny,  #)  =  .8 

.25 

.3 

.1 

' 

.67 

it  =  .45 

Rainy 

JJ(Rainy,  L)  =  2 
B(Rainy,  #)  =  .8 

:> 

v  =  .13 

Rainy 

fi(Rainy,  L)  =  .05 
B(Rainy,  #)  -  .95 

b 


.33 


Now  suppose  that  you  receive  the  report  ###L  from  London  and  you  want  to 
find  out  what  the  most  likely  sequence  of  weather  reports  was  for  those  days.  The 
Viterbi  algorithm  will  solve  the  problem. 

The  easiest  way  to  envision  the  way  that  Viterbi  works  is  to  imagine  a  lattice,  in 
which  each  column  corresponds  to  a  step  and  each  row  corresponds  to  a  state  in  M: 
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EXAMPLE  5.37  ( Continued ) 


The  number  shown  at  each  point  (q.  r)  is  the  value  that  Viicrhi  computes  for 
score[qj\.  So  we  can  think  ofViterbi  as  creating  this  lattice  left  to  right,  and  filling 
in  scores  as  it  goes  along.  The  arrows  represent  possible  transitions  in  M.  The 
heavy  arrows  indicate  the  path  that  is  recorded  in  the  matrix  backptr. 

At  (  =  0,  the  probabilities  recorded  in  score  are  just  the  initial  probabilities,  as 
given  in  it.  So  the  sum  of  the  values  in  column  1  is  1.  At  later  steps,  the  sum  is  less 
than  1  because  we  are  considering  only  the  probabilities  of  paths  through  M  that 
result  in  the  observed  output  sequence.  Other  paths  could  have  produced  other 
output  sequences. 

At  all  times  t  >  0,  the  values  for. score  can  be  computed  by  considering  the  prob¬ 
abilities  at  the  previous  time  (as  recorded  in  score),  the  probabilities  of  moving 
from  one  state  to  another  (as  recorded  in  the  matrix  A),  and  the  probabilities 
(recorded  in  the  vector  O)  of  observing  the  next  output  symbol.  To  see  how  the 
Viterbi  algorithm  computes  those  values,  let's  compute  the  value  of  score[ Sunny,  1]: 

candulate-score[ Sunny]  =  .vcore[Sunny. 0]  •  AjSunny.  Sunny)  •  tf[Sunny.#| 

=  .55  •  .75  •  .3 
=  .12 

candidate-score[ Rainy]  =  scorcfRainy.  0)  •  A)Rainy,  Sunny)  •  #)Rainy.#l 

=  .45  •  .3  •  .8 
=  .11 

So  A'core[Sunny,  1]  =  max(. 12,  .11)  =  .12,  and  hackpir( Sunny.  1)  is  set  to  Sunny. 

Once  all  the  values  of  score  have  been  computed,  the  final  step  is  to  observe 
that  Sunny  is  the  most  likely  state  for  M  to  have  reached  just  prior  to  reading  a 
fifth  output  symbol.  The  stale  that  most  likely  preceded  it  is  Sunny,  so  we  report 
Sunny  as  the  last  state  to  have  produced  output.  Then  we  trace  the  backpointers 
and  report  that  the  most  likely  sequence  of  weather  reports  is  Rainy.  Rainy, 
Rainy,  Sunny. 

Now  suppose  that  the  fax  machine  was  broken  and  the  reports  for  last  week 
came  in  with  the  city  names  chopped  off  the  top.  You  have  received  the  report 
### L  and  you  want  to  know  whether  it  is  more  likely  that  it  came  from  London  or 
from  Athens.  To  solve  this  problem,  you  use  the  forward  algorithm.  You  run  the 
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output  sequence  ### L  through  the  London  model  and  through  the  Athens  model, 
this  time  computing  the  total  probability  (as  opposed  to  just  the  probability  along 
the  best  path)  of  reaching  each  state  from  any  path  that  is  consistent  with  the  out¬ 
put  sequence.  The  most  likely  source  of  this  report  is  the  model  with  the  highest 
final  probability. 
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So  far,  we  have  considered,  as  input  to  our  machines,  only  strings  of  finite  length.  Thus 
we  have  focused  on  problems  for  which  we  expect  to  write  programs  that  read  an 
input,  compute  a  result,  and  hall.  Many  problems  are  of  that  sort,  but  some  are  not.  For 
example,  consider: 

•  An  operating  system. 

•  An  air  traffic  control  system. 

•  A  factory  process  control  system. 

Ideally,  such  systems  never  halt.  They  should  accept  an  infinite  string  of  inputs  and 
continue  to  function.  Define  2“  to  be  the  set  of  infinite  length  strings  drawn  from  the 
alphabet  2.  For  the  rest  of  this  discussion,  define  a  language  to  be  a  set  of  such  infinite- 
length  strings. 

To  model  the  behavior  of  processes  that  do  not  halt,  we  can  extend  our  notion  of  an 
NDFSM  to  define  a  machine  whose  inputs  are  elements  of  2".  Such  machines  are 
sometimes  called  co-automata  (or  omega  automata). 

We'll  define  one  particular  kind  of  co-automaton:  A  Buchi  automaton  is  a  quintuple 
(K,  2,  A,  S,A).  where: 

•  K  is  a  finite  set  of  states. 

•  2  is  the  input  alphabet. 

•  SQK  is  a  set  of  start  states. 

•  A  C  K  is  the  set  of  accepting  states. 

•  A  is  the  transition  relation.  It  is  a  finite  subset  of: 

(K  X  2)  X  K. 

Note  that,  unlike  NDFSMs,  Buchi  automata  may  have  more  than  one  start  state. 
Note  also  that  the  definition  of  a  Buchi  automaton  does  not  allow  e-transitions. 

We  define  configuration,  initial  configuration,  yields-in-one-step,  and  yields  exactly 
as  we  did  for  NDFSMs.  A  computation  of  a  Biichi  automaton  M  is  an  infinite  sequence 
of  configurations  C0,  Cj, . . .  such  that: 

•  Co  is  an  initial  configuration,  and 

•  C0  l~w  Q  |-w  C2  \~m  ... 
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But  now  wc  must  define  what  it  means  for  a  Biichi  automaton  M  to  accept  a  string. 
We  can  no  longer  define  acceptance  by  the  stale  of  M  when  it  runs  out  of  input,  since  it 
won’t.  Instead,  we  ll  say  that  M  accepts  a  string  ir  e  ii"  iff.  in  at  least  one  of  its  compu¬ 
tations,  there  is  some  accepting  state  q  such  that,  when  processing  ie.  M  enters  q  an  in¬ 
finite  number  of  times.  So  note  that  it  is  not  required  that  M  enter  an  accepting  slate 
and  stay  there.  But  it  is  not  sufficient  for  V/  to  enter  an  accepting  stale  just  once  (or  any 
finite  number  of  times).  As  before,  the  language  accepted  by  M.  denoted  /.(A/I.  is  the 
set  of  all  strings  accepted  by  M.  A  language  L  is  Biichi-aecepiuhle  ill  it  is  accepted  by 
some  Biichi  automaton. 


Biichi  automata  can  be  used  to  model  concurrent  systems,  hardware  devices, 
and  their  specifications. Then  programs  called  model  checkers  can  verily  that 
those  systems  correctly  conform  to  a  set  of  stated  requirements.  (H.1.2) 


EXAMPLE  5.38  Buchi  Automata  for  Event  Sequences 

Suppose  that  there  arc  five  kinds  of  events  that  can  occur  in  the  system  that  we 
wish  to  model.  We’ll  call  them  a.  b.  c.  d.  and  e.  So  let  £  =  { a.  b.  c.  d.  e ) . 

We  first  consider  the  case  in  which  we  require  that  event  e  occur  at  least  once. 
The  following  (nondeterministic)  BUchi  automaton  accepts  all  and  only  the  ele¬ 
ments  of  that  contain  at  least  one  occurrence  of  e: 


a.b.c.d  a.b.c.d.e 


Now  suppose  that  we  require  that  there  come  a  point  after  which  only  e's  can 
occur.  The  following  Biichi  automaton  (described  using  our  convention  that  the 
dead  state  need  not  be  written  explicitly)  accepts  all  and  only  the  elements  of  'Zut 
that  eventually  reach  a  point  after  which  no  events  other  than  e's  occur: 


a,b,c,d,e 


c 


Finally,  suppose  thai  we  require  that  every  c  event  be  immediately  followed  by 
an  e  event. The  following  Biichi  automaton  (this  time  with  the  dead  stale.  3.  shown 
explicitly)  accepts  all  and  only  the  elements  of  that  satisfy  that  requirement: 
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EXAMPLE  5.38  ( Continued ) 


a.b.d.e 


EXAMPLE  5.39  Mutual  Exclusion 

Suppose  that  we  want  to  model  a  concurrent  system  with  two  processes  and  en¬ 
force  the  constraint,  often  called  a  mutual  exclusion  property,  that  it  never  hap¬ 
pens  that  both  processes  are  in  their  critical  regions  at  the  same  time.  We  could 
do  this  in  the  usual  way.  using  an  alphabet  of  atomic  symbols  such  as  {Both, 
NotBoth}.  where  the  system  receives  the  input  Both  at  any  time  interval  at 
which  both  processes  are  in  their  critical  region  and  the  input  NotBoth  at  any 
other  time  interval.  But  a  more  direct  way  to  model  the  behavior  of  complex 
concurrent  systems  is  to  allow  inputs  that  correspond  to  Boolean  expressions 
that  capture  the  properties  of  interest.  That  way.  the  same  Boolean  predicates 
can  be  combined  into  different  expressions  in  different  machines  that  corre¬ 
spond  to  different  desirable  properties.  To  capture  the  mutual  exclusion  con¬ 
straint.  well  use  two  Boolean  predicates,  CRq,  which  will  be  True  iff  process  is  in 
its  critical  region  and  CR\,  which  will  be  True  iff  process ,  is  in  its  critical  region. 
The  inputs  to  the  system  will  then  be  drawn  from  a  set  of  three  Boolean  expres¬ 
sions:  {(C7?n  A  CR\),  -i(C/?n  A  C7?j),  True).  The  following  Buchi  automaton  ac¬ 
cepts  all  and  only  the  input  sequences  that  satisfy  the  property  that  (CK„  A  CR,) 
never  occurs: 


^(C/^aCK,) 


True 


(CTZuaC7?i) 


While  there  is  an  obvious  similarity  between  Buchi  automata  and  FSMs,  and  the 
languages  they  accept  are  related,  as  described  below,  there  is  one  important  differ¬ 
ence.  For  Buchi  automata,  nondeterminism  matters. 
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EXAMPLE  5.40  For  Buchi  Automata,  Nondeterminism  Matters 


Let  L  =  {we  {a.  b}'“ :  #b  (w)  is  finite).  Note  that  every  string  in  L  must  con¬ 
tain  an  infinite  number  of  a’s.  The  following  nondeterminislic  Buchi  automa¬ 
ton  accepts  L : 


a,b 


a 


a 


We  can  try  to  build  a  corresponding  deterministic  machine  by  using  the  construc¬ 
tion  that  we  used  in  the  proof  of  Theorem  5.3  (which  says  that  for  every  NDFSM 
there  does  exist  an  equivalent  DFSM).  The  states  of  the  new  machine  will  then 
correspond  to  subsets  of  slates  of  the  original  machine  and  we’ll  have: 


b  a 

a 


b 

This  new  machine  is  indeed  nondcterministic  and  it  does  accept  all  strings  in  L. 
Unfortunately,  it  also  accepts  an  infinite  number  of  strings  that  are  not  in  L.  in¬ 
cluding  (ba)w.  More  unfortunately,  we  cannot  do  any  better. 


THEOREM  5.7  Nondeterministic  versus  Deterministic  Biichi  Automata 

Theorem:  There  exist  languages  that  can  be  accepted  by  a  nondcterministic  Biichi 
automaton  (i.e..  one  that  meets  the  definition  we  have  given ).  but  for  which  there 
exists  no  equivalent  deterministic  Buchi  automaton  (i.e..  one  that  has  a  single 
start  slate  and  whose  transitions  are  defined  by  a  function  from  ( K  x  to  K). 

Proof:  The  proof  is  by  a  demonstration  that  no  deterministic  Biichi  automaton  ac¬ 
cepts  the  language  L  -  {we  {a.  b)“:  #b(»;)  finite]  of  Example  5.40.  Suppose 
that  there  were  such  a  machine  B.  Then,  among  the  strings  accepted  by  B,  would 
be  every  string  of  the  form  wa",  where  w  is  some  finite  string  in  { a.  b }  *.  This  must 
be  true  since  all  such  strings  contain  only  a  finite  number  of  b's.  Remove  from  B 
any  states  that  are  not  reachable  from  the  start  slate.  Now  consider  any  remaining 
state  q  in  B.  Since  q  is  reachable  from  the  start  state,  there  must  exist  at  least  one 
finite  string  that  drives  B  from  the  start  state  to  q.  Call  that  siring  w.Then,  as  we 
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just  observed,  tra"  is  in  L  and  so  must  be  accepted  by  B.  In  order  for  B  to  accept  il, 
there  must  be  at  least  one  accepting  stale  q„  that  occurs  infinitely  often  in  the  com¬ 
putation  of  B  on  ira'".  That  accepting  state  must  be  reachable  from  q  (the  state  of  B 
when  just  tr  has  been  read)  by  some  finite  number,  which  we'll  call  aq.  of  a’s  (since 
B  has  only  a  finite  of  stales).  Compute  aq  for  every'  slate  q  in  B.  Let  m  be  the  maxi¬ 
mum  of  the  a,t  values. 

We  can  now  show  that  B  accepts  the  string  (ba'")“',  which  is  not  in  L.  Since  B  is 
deterministic,  its  transition  function  is  defined  on  all  (state,  input)  pairs,  so  it  must 
run  forever  on  all  strings  including  (ba'")"  From  the  last  paragraph  we  know  that, 
from  any  state,  there  is  a  string  of  m  or  fewer  a's  that  can  drive  B  to  an  accepting 
state.  So.  in  particular,  after  each  time  it  reads  a  b,  followed  by  a  sequence  of  a’s,  B 
must  reach  some  accepting  slate  within  m  a’s.  But  B  has  only  a  finite  number  of  ac¬ 
cepting  slates.  So,  on  input  (ba'")w,  B  reaches  some  accepting  stale  an  infinite  num¬ 
ber  of  limes  and  il  accepts. 


There  is  a  natural  relationship  between  the  languages  of  infinite  strings  accepted 
by  Buchi  automata  and  the  regular  languages  (i.e.,  the  languages  of  finite  strings  ac¬ 
cepted  by  FSMs).  To  describe  this  relationship  requires  an  understanding  of  the  clo¬ 
sure  properties  of  the  regular  languages  that  we  will  present  in  Section  8.3.  as  well  as 
some  of  the  decision  procedures  for  regular  languages  that  we  will  present  in  Chapter  9. 

It  would  be  helpful  to  read  those  sections  before  continuing  to  read  this  discussion  of 
Bilchi  automata. 

Any  Buchi-acceptable  language  can  be  described  in  terms  of  regular  languages.  To 
see  how,  observe  that  any  Buchi  automaton  B  can  almost  be  viewed  as  an  FSM,  if  we 
simply  consider  input  strings  of  finite  length.  The  only  reason  that  that  can’t  quite  be 
done  is  that  Buchi  automata  may  have  multiple  start  slates.  So,  from  any  Buchi  au¬ 
tomaton  fl,  we  can  build  what  we’ll  call  the  mirror  FSM  M  to  B  as  follows:  Let  M  =  B 
except  that,  if  B  has  more  than  one  start  state,  then,  in  M,  create  a  new  start  state  that 
has  an  e-transition  to  each  of  the  start  slates  of  B.  Notice  that  the  set  of  finite  length 
strings  that  can  drive  B  from  a  start  state  to  some  state  q  is  identical  to  the  set  of  finite 
length  strings  that  can  drive  M  from  its  start  slate  to  stale  q. 

Now  consider  any  Biichi  automaton  B  and  any  string  w  that  B  accepts.  Since  w  is  ac¬ 
cepted.  there  is  some  accepting  state  in  B  that  is  visited  an  infinite  number  of  times 
while  B  processes  w.  Call  that  state  q.  (There  may  be  more  than  one  such  state.  Pick 
one.)  Then  we  can  divide  w  into  two  parts,  x  and  y.Thc  first  part,  x,  has  finite  length 
and  it  drives  B  from  a  start  stale  to  q  for  the  first  time. The  second  part.y,  has  infinite 
length  and  it  simply  pushes  B  through  one  loop  after  another,  each  of  which  starts  and 
ends  in  q  (although  there  may  be  more  than  one  path  that  does  this).  The  set  of  possi¬ 
ble  values  for  x  is  regular:  It  is  exactly  the  set  that  can  be  accepted  by  the  FSM  M  that 
mirrors  B.  if  we  let  q  be  Af’s  only  accepting  stale.  Call  a  path  from  q  back  to  itself 
minimal  ifl  it  does  not  pass  through  </.Then  we  also  notice  that  the  set  of  strings  that 
can  force  B  through  such  a  minimal  path  is  also  regular.  It  is  the  set  accepted  by  the 
FSM  M  that  mirrors  B ,  if  we  let  q  be  both  M’s  start  state  and  its  only  accepting  state. 
These  observations  lead  to  the  following  theorem: 
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THEOREM  5.8  Buchi-Acceptable  and  Regular  Languages 

Theorem:  L  is  a  BUchi-acceptable  language  iff  it  is  the  finite  union  of  sets  each  of 
which  is  of  the  form  XY  ",  where  each  X  and  Y  is  a  regular  language. 

Proof:  Given  any  Biichi  automaton  B  =  ( K .  S,  A.  5.  A),  let  be  the  set  of  all 
strings  that  drive  B  from  slate  q{]  to  state  q\.  Then,  by  the  definition  of  what  it 
means  for  a  Biichi  automaton  to  accept  a  string,  we  have: 

UB)  =  (JU'W.r 

reX  i/nA 

If  L  is  a  BUchi-acceptable  language,  then  there  is  some  Buchi  automaton  B 
that  accepts  it.  So  the  only-if  part  of  the  claim  is  true  since: 

•  5  and  A  arc  both  finite, 

•  For  each  s  and  q,  Wtl/  is  regular  since  it  is  the  set  of  strings  accepted  by  B's  mir¬ 
ror  FSM  M  with  start  state  s  and  single  accepting  stale  q . 

•  Wq(,  =  Y*,  where  Y  is  the  set  of  strings  that  can  force  B  along  a  minimal  path 
from  q  back  to  q, 

•  Y  is  regular  since  it  is  the  set  of  strings  accepted  by  B  s  mirror  FSM  M  with  q 
as  its  start  slate  and  its  only  accepting  stale,  and 

•  The  regular  languages  are  closed  under  Kleene  star  so  Wqi/  =  Y  *  is  also  regular. 

The  if  part  follows  from  a  set  of  properties  of  the  BUchi-acceptable  and  regular 
languages  that  are  described  in  Theorem  5.9. 


THEOREM  5.9  Closure  Properties  of  Buchi  Automata 


Theorem  and  Proof:  The  BUchi-acceptable  languages  (like  the  regular  languages) 
are  closed  under: 

•  Concatenation  with  a  regular  language:  If  L\  is  a  regular  language  and  L2  is  a 
BUchi-acceptable  language,  then  L,L2  is  BUchi-acceptable.  The  proof  is  simi¬ 
lar  to  the  proof  that  the  regular  languages  are  closed  under  concatenation  ex¬ 
cept  that,  since  e  transitions  are  not  allowed,  the  machines  for  the  two  languages 
must  be  “glued  together”  differently.  If  q  is  a  state  in  the  FSM  that  accepts 
Lj.  and  there  is  a  transition  from  q,  labeled  c,  to  some  accepting  state,  then 
add  a  transition  from  q ,  labeled  c,  to  each  start  state  of  the  BUchi  automaton 
that  accepts  L2. 

•  Union:  If  L\  and  L2  are  BUchi-acceptable.  then  f„,  U  Li  is  also  BUchi-accepl- 
able.The  proof  is  analogous  to  the  proof  that  the  regular  languages  are  closed 
under  union.  Again,  since  e  transitions  arc  not  allowed,  we  must  use  a  slightly 
different  glue.  TTte  new  machine  we  will  build  will  have  transitions  directly 
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from  a  new  start  state  to  the  states  that  the  original  machines  can  reach  after 
reading  one  input  character. 

•  Intersection:  If  L\  and  L2  are  Biichi-acceptable,  then  L\  fl  L2  is  also  Biichi-ac- 
ceptable.  The  proof  is  by  construction  of  a  Buchi  automaton  that  effectively 
runs  a  Buchi  automaton  for  Lj  in  parallel  with  one  for  L2. 

•  Complement:  If  L  is  Biichi-acceptable,  then  ->L  is  also  Biichi-acceptable.  The 
proof  of  this  claim  is  less  obvious.  It  is  given  in  [Thomas  1990]. 

Further,  if  L  is  a  regular  language,  then  L"  is  Biichi-acceptable.  The  proof  is 
analogous  to  the  proof  that  the  regular  languages  are  closed  under  Kleene  star, 
but  we  must  again  use  the  modification  that  was  used  above  in  the  proof  of  clo¬ 
sure  under  concatenation. 


Buchi  automata  are  useful  as  models  for  computer  systems  whose  properties  we 
wish  to  reason  about  because  a  set  of  important  questions  can  be  answered  about 
them.  In  particular,  Biichi  automata  share  with  FSMs  the  existence  of  decision  proce¬ 
dures  for  all  of  the  properties  described  in  the  following  theorem: 

THEOREM  5.10  Decision  Procedures  for  Buchi  Automata 

Theorem:  There  exist  decision  procedures  for  all  of  the  following  properties: 

•  Emptiness:  Given  a  Buchi  automaton  B,  is  L(B)  empty? 

•  Nonemptiness:  Given  a  Biichi  automaton  B,  is  L(B)  nonempty? 

•  Inclusion:  Given  two  Biichi  automata  B\  and  J32.  is  Q  ^(#2)? 

•  Equivalence:  Given  two  Biichi  automata  B\  and  B7,  is  L(B  1)  =  HB-ifi 

Proof:  The  proof  of  each  of  these  claims  can  be  found  in  [Thomas  1990]. 


Exercises 

1.  Give  a  clear  English  description  of  the  language  accepted  by  the  following  DFSM: 


6 


122  Chapter  5  Finite  State  Machines 


2.  Show  a  DFSM  lo  accept  each  of  the  following  languages: 

a.  { we  {a.  b}*  :  every  a  in  w  is  immediately  preceded  and  followed  by  b}. 

b.  {tee  {a,  b}*  :  wdoes  not  end  in  ba}. 

c.  { e  {0, 1}  *  :  w  corresponds  to  the  binary  encoding,  without  leading  0’s,  of  nat¬ 
ural  numbers  that  are  evenly  divisible  by  4}. 

d.  {wie  {0,1}*  :  w  corresponds  to  the  binary  encoding,  without  leading  0’s,  of  nat¬ 
ural  numbers  that  are  powers  of  4}. 

e.  { w  e  { 0*9 }  *  :  w  corresponds  to  the  decimal  encoding,  without  leading  0’s,  of  an 
odd  natural  number}. 

f.  {we  {0.1}*  :  w  has  001  as  a  substring}. 

g.  {w  e  {0, 1}*  :  w  does  not  have  001  as  a  substring] . 

h.  {we  {a.  b}* :  w  has  bbab  as  a  substring}. 

i.  {we  {a,  b}*  :  w  has  neither  ab  nor  bb  as  a  substring], 

j.  {«:e{a.b}*:u!  has  both  aa  and  bb  as  a  substrings } . 

k.  {we  {a,  b}*  :  w  contains  at  least  two  b’s  that  are  not  immediately  followed 
by  an  a}. 

l.  {we  {0, 1}*  :  w  has  no  more  than  one  pair  of  consecutive  0  s  and  no  more 
than  one  pair  of  consecutive  l’s}. 

m.  {we  {0, 1}*  :  none  of  the  prefixes  of  trends  in  0]. 

n.  {«>e  {a,b}*  :  (#a(w)  +  2‘#b(w))  «50}.  (#a(  w)  is  the  number  of  as  in  w). 

3.  Consider  the  children’s  game  Rock,  Paper.  Scissors  □.  We’ll  sav  that  the  first  player 

to  win  two  rounds  wins  the  game.  Call  the  two  players  A  and  B. 

a.  Define  an  alphabet  2  and  describe  a  technique  for  encoding  Rock.  Paper.  Scissors 
games  as  strings  over  2.  (Hint:  Each  symbol  in  1  should  correspond  to  an  ordered 
pair  that  describes  the  simultaneous  actions  of  A  and  B.) 

b.  Let  LRps  be  the  language  of  Rock.  Paper,  Scissors  games,  encoded  as  strings  as 
described  in  part  (a),  that  correspond  to  wins  for  player  A.  Show  a  DFSM  that 
accepts  Lffps- 

4.  If  M  is  a  DFSM  and  e  e  L(M ),  what  simple  property  must  be  true  of  M'l 

5.  Consider  the  following  NDFSM  M: 


a  a 
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For  each  of  the  following  strings  w ,  determine  whether  w  e  L{M)’ 
u>  stsbbbi* 

b.  bab. 

c.  baba. 

6.  Show  a  possibly  nondeterministic  FSM  to  accept  each  of  the  following  languages: 

a.  { a"bam : n , m  20 ,  n  =3m}. 

b.  { w  £  {a.b}*  :  w  contains  at  least  one  instance  of  aaba,  bbb  or  ababa}. 

c.  { w  e  { 0-9}*  :  w  corresponds  to  the  decimal  encoding  of  a  natural  number  whose 
encoding  contains,  as  a  substring,  the  encoding  of  a  natural  number  that  is  di¬ 
visible  by  3}. 

d.  {we  {0,  l}*:w>  contains  both  101  and  010  as  substrings}. 

e.  { w  e  { 0. 1 }  *  :  w  corresponds  to  the  binary  encoding  of  a  positive  integer  that  is 
divisible  by  16  or  is  odd}. 

f.  {«>e{a,  b.  c,  d,  e } * :  |tr|  ^  2  and  w  begins  and  ends  with  the  same  symbol}. 

7.  Show  an  FSM  (deterministic  or  nondeterministic)  that  accepts  L  =  {we  {a,  b, 
c}* :  w  contains  at  least  one  substring  that  consists  of  three  identical  symbols  in  a 
row}.  For  example: 

•  The  following  strings  are  in  L :  aabbb ,  baacccbbb . 

•  The  following  strings  are  not  in  Lie,  aba,  abababab,  abcbcab. 

8.  Show  a  DFSM  to  accept  each  of  the  following  languages.  The  point  of  this  exercise 
is  to  see  how  much  harder  it  is  to  build  a  DFSM  for  tasks  like  these  than  it  is  to 
build  an  NDFSM.  So  do  not  simply  build  an  NDFSM  and  then  convert  it.  But  do, 
after  you  build  a  DFSM,  build  an  equivalent  NDFSM. 

a.  j  nte{a.b}*:  the  fourth  from  the  last  character  is  a}. 

b.  \we  {a.b}*  :  3.r,ye  {a,b}*  :  ((u>  =  x  abbaa  y)  V  ( w  =  .v  baba  y))}. 

9.  For  each  of  the  following  NDFSMs.  use  ndfsnUodfsm  to  construct  an  equivalent 
DFSM.  Begin  by  showing  the  value  of  eps(q)  for  each  state  q: 


<b) 
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(e) 


10.  Let  M  be  the  following  NDFSM.  Construct  (using  luljsnUotlism),  a  DFSM  that 
accepts  ~>L{M ). 


11.  For  each  of  the  following  languages  L : 

(i)  Describe  the  equivalence  classes  of  fe/  t 

(ii)  If  the  number  of  equivalence  classes  of  »=/  is  finite, construct  the  minimal  DFSM 
that  accepts  L. 

a.  {we  (0,1|*  :  every  0  in  w  is  immediately  followed  by  the  string  11}. 

b.  {we  {0. 1}*  :  w  has  either  an  odd  number  of  Is  and  an  odd  number  of  0's  or 
it  has  an  even  number  of  l's  and  an  even  number  of  O  s}. 

c.  {we  (a,  b}*  :  tr  contains  at  least  one  occurrence  of  the  string  aababa}. 

d.  { wwR  ■  we  {a.b}*}. 

e.  {ive  (a,  b}*  :  w  contains  at  least  one  a  and  ends  in  at  least  two  b*s}. 

f.  { w  e  { 0, 1 }  *  :  there  is  no  occurrence  of  the  substring  000  in  ir } . 


Exercises  1 25 


12.  Let  M  be  the  following  DFSM.  Use  minDFSM  to  minimize  M. 


13.  Construct  a  deterministic  finite  state  transducer  with  input  alphabet  {a,  b}  for 
each  of  the  following  tasks: 

a.  On  input  reproduce  1",  where  n  =  #a(?r). 

b.  On  input  w,  produce  1",  where  n  =  #a(w>)/2. 

c.  On  input  w ,  produce  1",  where  n  is  the  number  of  occurrences  of  the  substring 
aba  in  w. 

14.  Construct  a  deterministic  finite  state  transducer  that  could  serve  as  the  controller 
for  an  elevator.  Clearly  describe  the  input  and  output  alphabets,  as  well  as  the 
states  and  the  transitions  between  them. 

15.  Consider  the  problem  of  counting  the  number  of  words  in  a  text  file  that  may  con¬ 
tain  letters  plus  any  of  the  following  non-letter  characters: 

<blunk><lmefeed><eml-of-file> , . ; 

Define  a  word  to  be  a  string  of  letters  that  is  preceded  by  either  the  beginning 
of  the  file  or  some  non-letter  character  and  that  is  followed  by  some  non-letter 
character.  For  example,  there  are  1 1  words  in  the  following  text: 

The  <blank><blank>  cat  <blankxlinefeed> 

saw  <blank>  the  <blankxblankxblank>  rat  <linefeed> 

<blank>  with 

<linefeed>  a<blank>  hat  <7 inefeed> 
on  <blank>  the  <blankxblank>  mat  <end-of-file> 

Describe  a  very  simple  finite-state  transducer  that  reads  the  characters  in  the 
file  one  at  a  time  and  solves  the  word-counting  problem.  Assume  that  there  exists 
an  output  symbol  with  the  property  that,  every  time  it  is  generated,  an  external 
counter  gets  incremented. 

1G.  Real  traffic  light  controllers  are  more  complex  than  the  one  that  we  drew  in 
Example  5.29. 

a.  Consider  an  intersection  of  two  roads  controlled  by  a  set  of  four  lights  (one  in 
each  direction).  Don’t  worry  about  allowing  for  a  special  left-turn  signal.  De¬ 
sign  a  controller  for  this  four-light  system. 
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b.  As  an  emergency  vehicle  approaches  an  intersection,  it  should  be  able  to  send 
a  signal  that  will  cause  the  light  in  its  direction  to  turn  green  and  the  light  in  the 
cross  direction  to  turn  yellow  and  then  red.  Modify  your  design  to  allow  this. 

17.  Real  bar  code  systems  are  more  complex  than  the  one  that  we  sketched  in 
Example  5.31.  They  must  be  able  to  encode  all  ten  digits,  for  example.  There  are 
several  industry-standard  formats  for  bar  codes,  including  the  common  UPC  code 
O  found  on  nearly  everything  we  buy.  Describe  a  finite  state  transducer  that  reads 
the  bars  and  outputs  the  corresponding  decimal  number. 


18.  Extend  the  description  of  the  Soundex  FSM  that  was  started  in  Example  5.33  so 
that  it  can  assign  a  code  to  the  name  Pfit'er.  Remember  that  you  must  take  into  ac¬ 
count  the  fact  that  every  Soundex  code  is  made  up  of  exactly  four  characters. 

19.  Consider  the  weather/passport  HMM  of  Example  5.37. Trace  the  execution  of  the 
Viterbi  and  forward  algorithms  to  answer  the  following  questions: 

a.  Suppose  that  the  report  ###L  is  received  from  Athens.  What  was  the  most 
likely  weather  during  the  time  of  the  report  ? 

b.  Is  it  more  likely  that  ###L  came  from  London  or  from  Athens? 

20.  Construct  a  Biichi  automaton  to  accept  each  of  the  following  languages  of  infinite 
length  strings: 

a.  {we  {a,  b,  c}w :  after  any  occurrence  of  an  a  there  is  eventually  an  occurrence 
of  a  b}. 

b.  {w  e  { a, b, c}" :  between  any  two  consecutive  as  there  is  an  odd  number  of  b’s] . 

c.  {we  {a,  b,c}“  :  there  never  comes  a  time  after  which  no  b's  occur}. 

21.  In  H.2,  we  describe  the  use  of  statecharts  as  a  tool  for  building  complex  systems. 
A  statechart  is  a  hierarchically  structured  transition  network  model.  Statecharts 
aren't  the  only  tools  that  exploit  this  idea.  Another  is  Simulink8'  Q,  which  is  one 
component  of  the  larger  programming  environment  MATLAB*'  Q.  Use  Simulink 
to  build  an  FSM  simulator. 

22.  In  1.1.2,  we  describe  the  Alternating  Bit  protocol  for  handling  message  trans¬ 
mission  in  a  network.  Use  the  FSM  that  describes  the  sender  to  answer  the 
question,  “Is  there  any  upper  bound  on  the  number  of  times  a  message  may  be  re¬ 
transmitted?” 

23.  In  J.l,  we  show  an  FSM  model  of  a  simple  intrusion  detection  device  that  could  be 
part  of  a  building  security  system.  Extend  the  model  to  allow  the  system  to  have  two 
zones  that  can  be  armed  and  disarmed  independently  of  each  other. 
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Regular  Expressions 


Let’s  now  take  a  different  approach  to  categorizing  problems.  Instead  of  focus¬ 
ing  on  the  power  of  a  computing  device,  let’s  look  at  the  task  that  we  need  to 
perform.  In  particular,  let’s  consider  problems  in  which  our  goal  is  to  match  fi¬ 
nite  or  repeating  patterns.  For  example,  consider: 

•  The  first  step  of  compiling  a  program: This  step  is  called  lexical  analysis.  Its  job  is  to 
break  the  source  code  into  meaningful  units  such  as  keywords,  variables,  and  num¬ 
bers.  For  example,  the  string  void  may  be  a  keyword,  while  the  string  23E-12 
should  be  recognized  as  a  floating  point  number. 

•  Filtering  email  for  spam. 

•  Sorting  email  into  appropriate  mailboxes  based  on  sender  and/or  content  words 
and  phrases. 

•  Searching  a  complex  directory  structure  by  specifying  patterns  that  are  known  to 
occur  in  the  file  we  want. 

In  this  chapter,  we  will  define  a  simple  pattern  language.  It  has  limitations.  But  its 
strength,  as  we  will  soon  see,  is  that  we  can  implement  pattern  matching  for  this  lan¬ 
guage  using  finite  state  machines. 


In  his  classic  book,  A  Pattern  Language  B,  Christopher  Alexander  described 
common  patterns  that  can  be  found  in  successful  buildings,  towns  and  cities. 
Software  engineers  read  Alexander’s  work  and  realized  that  the  same  is  true 
of  successful  programs  and  systems.  Patterns  are  ubiquitous  in  our  world. 
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6.1  What  is  a  Regular  Expression? 

The  regular  expression  language  lhal  we  arc  about  to  describe  is  built  on  an  alphabet 
that  contains  two  kinds  of  symbols: 

•  A  set  of  special  symbols  to  which  we  will  attach  particular  meanings  when  they 
occur  in  a  regular  expression. These  symbols  are  0.  U .  e.  (.).  *,  and  \ 

•  An  alphabet  X.  which  contains  the  symbols  that  regular  expressions  will  match 
against. 

A  regular  expression  P  is  a  string  that  can  be  formed  according  to  the  following 
rules: 

1.  0  is  a  regular  expression. 

2.  b  is  a  regular  expression. 

3.  Every  element  in  2  is  a  regular  expression. 

4.  Given  two  regular  expressions  a  and  p.  up  is  a  regular  expression. 

5.  Given  two  regular  expressions  a  and  p,  a  U  P  is  a  regular  expression. 

6.  Given  a  regular  expression  a.  a*  is  a  regular  expression. 

7.  Given  a  regular  expression  a.  a +  is  a  regular  expression. 

8.  Given  a  regular  expression  a,  (a)  is  a  regular  expression. 

So.  if  we  let  2  =  { a,  b ),  the  following  strings  are  regular  expressions: 

0,8.  a.  b.(aUb)*,  abbaUe. 

The  language  of  regular  expressions,  as  we  have  just  defined  it.  is  useful  because 
every  regular  expression  has  a  meaning  (just  like  every  English  sentence  and  every 
Java  program).  In  the  case  of  regular  expressions,  the  meaning  of  a  string  is  another 
language.  In  other  words,  every  string  a  (such  as  abba  Ue)  in  the  regular  expression 
language  has,  as  its  meaning,  some  new  language  that  contains  exactly  the  strings  that 
match  the  pattern  specified  in  a. 

To  make  it  possible  to  determine  that  meaning,  we  need  to  describe  a  semantic  in¬ 
terpretation  function  for  regular  expressions.  Fortunately,  the  regular  expressions  lan¬ 
guage  is  simple.  So  designing  a  compositional  semantic  interpretation  function  (as 
defined  in  Section  2.2.6)  for  it  is  straightforward.  As  you  read  the  definition  that  we  are 
about  to  present,  it  will  become  clear  why  we  chose  the  particular  symbol  alphabet  we 
did.  In  particular,  you  will  notice  the  similarity  between  the  operations  that  are  allowed 
in  regular  expressions  and  the  operations  that  we  defined  on  languages  in  Section  2.2, 
Define  the  following  semantic  interpretation  function  L  lor  the  language  of  regular 
expressions: 

1.  L  (0)  =  0.  the  language  that  contains  no  strings. 

2.  L  (e)  =  {e}.  the  language  that  contains  just  the  empty  siring. 

3.  For  any  ceS,L  (t)  =  {c},  the  language  that  contains  the  single,  one-character 
string  c. 
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4.  For  any  regular  expressions  a  and  p.  L  ( a/3 )  =  L(a)  L(p).  In  other  words,  to 
form  the  meaning  of  the  concatenation  of  two  regular  expressions,  first  deter¬ 
mine  the  meaning  of  each  of  the  constituents.  Both  meanings  will  be  languages. 
Then  concatenate  the  two  languages  together.  Recall  that  the  concatenation  of 
two  languages  L\  and  L?  is  {w  -  xy ,  where  .veL|  and  yeLi).  Note  that,  if  ei¬ 
ther  L  (a)  or  L  (P)  is  equal  to  0,  then  the  concatenation  will  also  be  equal  to  0. 

5.  For  any  regular  expressions  rr  and  /3.  L  (a  U  P)  =  L  (or)  U  L  (P).  Again  we  form 
the  meaning  of  the  larger  expression  by  first  determining  the  meaning  of  each  of 
the  constituents.  Each  of  them  is  a  language.  The  meaning  of  a  U  P  then,  as  sug¬ 
gested  by  our  choice  of  the  character  U  as  an  operator,  is  the  union  of  the  two 
constituent  languages. 

6.  For  any  regular  expression  or,  L  ( a *)  =  ( L  (a))*,  where  *  is  the  Kleene  star  oper¬ 
ator  defined  in  Section  2.2.5.  So  L  (a*)  is  the  language  that  is  formed  by  concate¬ 
nating  together  zero  or  more  strings  drawn  from  L  (a). 

7.  For  any  regular  expression  a.L(a+)  =  L(aar*)  =  L  (a)(L  (a))*.  If  L  (or)  is 
equal  to  0,  then  L  (or*)  is  also  equal  to  0.  Otherwise  L  (a+)  is  the  language  that 
is  formed  by  concatenating  together  one  or  more  strings  drawn  from  L  (a). 

8.  For  any  regular  expression  a,  L  ((a))  —  L  (a).  In  other  words,  parentheses  have 
no  effect  on  meaning  except  to  group  the  constituents  in  an  expression. 

If  the  meaning  of  a  regular  expression  a  is  the  language  L.  then  we  say  that  a  defines 
or  describes  L. 

The  definition  that  we  have  just  given  for  the  regular  expression  language  contains 
three  kinds  of  rules: 

•  Rules  1 . 3. 4, 5.  and  6  give  the  language  its  power  to  define  sets,  starting  with  the 
basic  sets  defined  by  rules  1  and  3,  and  then  building  larger  sets  using  the  operators 
defined  by  rules  4, 5,  and  6. 

•  Rule  8  has  as  its  only  role  grouping  other  operators. 

•  Rules  2  and  7  appear  to  add  functionality  to  the  regular  expression  language.  But  in 
fact  they  don’t— they  serve  only  to  provide  convenient  shorthands  for  languages 
that  can  be  defined  using  only  rules  1,3-6,  and  8.  Let’s  see  why. 

First  consider  rule  2:The  language  of  regular  expressions  does  not  need  the  symbol  e  be¬ 
cause  it  has  an  alternative  mechanism  for  describing  L  (e).  Observe  that  L  (0*)  =  [w :  w 
is  formed  by  concatenating  together  zero  or  more  strings  from  0} .  But  how  many  ways  are 
there  to  concatenate  together  zero  or  more  strings  from  0?  If  we  select  zero  strings  to  con¬ 
catenate.  we  gel  e.  We  cannot  select  more  than  zero  since  there  aren't  any  to  choose  from. 
So  L  (0*)  =  {e}.  Thus,  whenever  we  would  like  to  write  e,  we  could  instead  write  0*.  It 
is  much  clearer  to  write  e,  and  we  shall.  But.  whenever  we  wish  to  make  a  formal  statement 
about  regular  expressions  or  the  languages  they  define,  we  need  not  consider  rule  2  since 
we  can  rewrite  any  regular  expression  that  contains  e  as  an  equivalent  one  that  contains  0* 
instead. 

Next  consider  rule  7:  As  we  showed  in  the  statement  of  rule  7  itself  the  regular  ex-* 
pression  a*  is  equivalent  to  the  slightly  longer  regular  expression  oa*.The  form  a*  is  a 
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convenient  shortcut,  and  we  will  use  it.  But  we  need  not  consider  rule  7  in  any  analysis 
that  we  may  choose  to  do  of  regular  expressions  or  the  languages  that  they  generate. 

The  compositional  semantic  interpretation  function  that  we  just  defined  lets  us 
map  between  regular  expressions  and  the  languages  that  they  define.  We  begin  by 
analyzing  the  smallest  subexpressions  and  then  work  outward  to  larger  and  larger 
expressions. 


EXAMPLE  6.1  Analyzing  a  Simple  Regular  Expression 

L((aUb)*b)  =  L((aUb)*)L(b) 

-  (L((aUb)))*L(b) 

=  (L(a)  U  L(b))*L(b) 

=  ({a}U{b})*{b} 

=  {a,  b}*{b}. 

So  the  meaning  of  the  regular  expression  (a  U  b)*b  is  the  set  of  all  strings  over 
the  alphabet  {a,b}  that  end  in  b. 


One  straightforward  way  to  read  a  regular  expression  and  determine  its  meaning  is 
to  imagine  it  as  a  procedure  that  generates  strings.  Read  it  left  to  right  and  imagine  it 
generating  a  string  left  to  right.  As  you  are  doing  that,  think  of  any  expression  that  is 
enclosed  in  a  Klecne  star  as  a  loop  that  can  be  executed  zero  or  more  times.  Each  time 
through  the  loop,  choose  any  one  of  the  alternatives  listed  in  the  expression.  So  we  can 
read  the  regular  expression  of  the  last  example,  (a  U  b)*b.  as.  “Go  through  a  loop  zero 
or  more  times,  picking  a  single  a  or  b  each  time.  Then  concatenate  b  ”  Any  siring  that 
can  be  generated  by  this  procedure  is  in  L(( a  U  b)*b). 


Regular  expressions  can  be  used  to  scan  text  and  pick  out  email  addresses. 

(0.2) 


EXAMPLE  6.2  Another  Simple  Regular  Expression 

L(((a  U b) (a U  b))a(a  U  b)*)  =  L(((aUb){aUb)))L(a)  L(( a  U  b)*) 

=  L((aU  b)(a  U  b))  {a}  (L((aU  b)))* 
=  L((aUb))L((aUb))  {a}  {a.b}* 

=  {a,b}  {a.b}  {a)  {a.b}* 
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So  the  meaning  of  the  regular  expression  C(aUb)(aUb))a(aUb)*  is: 

{xay :  x  and  y  are  strings  of  a’s  and  b’s  and  |ar|  =  2}. 

Alternatively,  it  is  the  language  that  contains  all  strings  of  a's  and  b’s  such  that 
there  exists  a  third  character  and  it  is  an  a. 


EXAMPLE  6.3  Given  a  Language,  Find  a  Regular  Expression 

Let  L  =  {we  {a,  b}*  :  |w|  is  evert}.  There  are  two  simple  regular  expressions 
both  of  which  define  L : 

((a  U  b)(a  U  b))*  This  one  can  be  read  as,  wGo  through  a  loop 

zero  or  more  times. 

Each  time  through,  choose  an  a  or  b,  then 
choose  a  second  character  (a  or  b) .” 

(aa  U  ab  U  ba  U  bb)*  This  one  can  be  read  as,  “Go  through  a  loop 

zero  or  more  times. 

Each  time  through,  choose  one  of  the  two- 
character  sequences.” 

From  this  example,  it  is  clear  that  the  semantic  interpretation  function  we  have  de¬ 
fined  for  regular  expressions  is  not  one-to-one.  In  fact,  given  any  language  L,  if  there  is 
one  regular  expression  that  defines  it,  there  is  an  infinite  number  that  do. This  is  trivially 
true  since,  for  any  regular  expression  a,  the  regular  expression  aUa  defines  the  same 
language  a  does. 

Recall  from  our  discussion  in  Section  2.2.6  that  this  is  not  unusual.  Semantic  inter¬ 
pretation  functions  for  English  and  for  Java  are  not  one-to-one.  The  practical  conse¬ 
quence  of  this  phenomenon  for  regular  expressions  is  that,  if  we  are  trying  to  design  a 
regular  expression  that  describes  some  particular  language,  there  will  be  more  than 
one  right  answer.  We  will  generally  seek  the  simplest  one  that  works,  both  for  clarity 
and  to  make  pattern  matching  fast. 


EXAMPLE  6.4  More  than  One  Regular  Expression  for  a  Language 

Let  L  =  {we  {a,  b}*  :  w  contains  an  odd  number  of  a’s}.  Two  equally  simple 
regular  expressions  that  define  L  are: 

b*  (ab*ab*)*  a  b*. 

b*  a  b*  (ab*ab*)*. 
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EXAMPLE  6.4  (Continued) 

Both  of  these  expressions  require  that  there  be  a  single  a  somewhere. There  can 
also  be  other  a's,  but  they  must  occur  in  pairs,  so  the  result  is  an  odd  number  of  a's. 
In  the  first  expression,  the  last  a  in  the  siring  is  viewed  as  the  required  “odd  a".  In 
the  second,  the  first  a  plays  that  role. 


The  regular  expression  language  that  we  have  just  defined  provides  three  operators. 
We  will  assign  the  following  precedence  order  to  them  (from  highest  to  lowest): 

1.  Kleene  star. 

2.  concatenation,  and 

3.  union. 

So  the  expression  (a  U  bb*a)  will  be  interpreted  as  (a  U  (b(b+a))). 

All  useful  languages  have  idioms:  common  phrases  that  correspond  to  common 
meanings.  Regular  expressions  are  no  exception.  In  writing  them,  we  will  ollen  use  the 
following: 


(o  U  e)  Can  he  read  as  "optional  n",  since  the  expression  can  he  satisfied  either  by  matching 
«  or  by  matching  the  empty  string. 

(a  U  b)*  Describes  the  set  of  all  strings  composed  of  the  characters  a  and  b.  More  general¬ 
ly.  given  any  alphabet  £  =  (c,.  c*.... <V,}.  the  language  1*  is  described  by  the 
regular  expression: 

(f,UtsU  •••  UcJ*. _ 


When  writing  regular  expressions,  the  details  matter.  For  example: 


a*U  b*  *  (a  U  fa)* 


(ab)*  *  a*b* 


The  language  on  the  right  contains  the  siring  ab.  while  the  language  on 
the  left  does  not.  Every  siring  in  the  language  on  the  lell  contains  only 
a’s  or  only  b's. 

The  language  on  the  left  contains  the  siring  abab.  while  the  language 
on  the  right  does  not.  The  language  on  the  right  contains  the  siting 
aaabbbb.  while  the  language  on  the  left  does  not. 


The  regular  expression  a*  is  simply  a  string.  It  is  different  from  the  language  /.(a*) 
=  '  w  is  composed  of  zero  or  more  as}.  However,  when  no  contusion  will  result, 

we  will  use  regular  expressions  to  stand  for  the  languages  that  they  describe  and  we 
will  no  longer  write  the  semantic  interpretation  function  explicitly.  So  we  will  be  able 
to  say  things  like, “The  language  a*  is  infinite." 
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Kleene's  Theorem 

The  regular  expression  language  lhat  we  have  just  described  is  significant  for  two 
reasons: 

•  It  is  a  useful  way  to  define  patterns. 

•  The  languages  that  can  be  defined  with  regular  expressions  are,  as  the  name  per¬ 
haps  suggests,  exactly  the  regular  languages.  In  other  words,  any  language  that  can 
be  defined  by  a  regular  expression  can  be  accepted  by  some  finite  state  machine. 
And  any  language  that  can  be  accepted  by  a  finite  state  machine  can  be  defined  by 
some  regular  expressions. 

In  this  section,  we  will  state  and  prove  as  a  theorem  the  claim  that  we  just  made: 
The  class  of  languages  that  can  be  defined  with  regular  expressions  is  exactly  the  reg¬ 
ular  languages.  This  is  the  first  of  several  claims  of  this  sort  that  we  will  make  in  this 
book.  In  each  case,  we  will  assert  that  some  set  A  is  identical  to  some  very  different 
looking  set  fl.The  proof  strategy  that  we  will  use  in  all  of  these  cases  is  the  same.  We 
will  first  prove  that  every  element  of  A  is  also  an  element  of  B.  We  will  then  prove 
that  every  element  of  B  is  also  an  element  of  A. Thus,  since  A  and  B  contain  the  same 
elements,  they  are  the  same  set.  ' 

1  Building  an  FSM  from  a  Regular  Expression 

THEOREM  6.1  For  Every  Regular  Expression  There  is  an  Equivalent  FSM 

Theorem:  Any  language  that  can  be  defined  with  a  regular  expression  can  be 
accepted  by  some  FSM  and  so  is  regular. 

Proof:  The  proof  is  by  construction.  We  will  show  that,  given  a  regular  expression  a, 
we  can  construct  an  FSM  M  such  that  L  (a)  =  L  (Af ). 

We  first  show  that  there  exists  an  FSM  lhat  corresponds  to  each  primitive  reg¬ 
ular  expression: 

•  If  a  is  any  ceS.we  construct  for  it  the  simple  FSM  shown  in  Figure  6.1  (a). 

•  If  or  is  0.  we  construct  for  it  the  simple  FSM  shown  in  Figure  6.1  (b). 

•  Although  it  s  not  strictly  necessary  to  consider  e  since  it  has  the  same  mean¬ 
ing  as  0*  we  11  do  so  since  we  don't  usually  think  of  it  that  way.  So,  if  a  is  e,  we 
construct  for  it  the  simple  FSM  shown  in  Figure  6.1  (c), 

Next  we  must  show  how  to  build  FSMs  to  accept  languages  that  are  defined  by 
regular  expressions  lhat  exploit  the  operations  of  concatenation,  union,  and  Kleene 
star.  Let  and  y  be  regular  expressions  that  define  languages  over  the  alphabet  2. 
If  L  (0)  is  regular,  then  it  is  accepted  by  some  FSM  M,  -  (K,.  2.  8,.  5,.  A,).  If 
L  (y)  is  regular,  then  it  is  accepted  by  some  FSM  M2  =  (/C2,  2, 82,  s2 .  A2). 


134  Chapter  6  Regular  Expressions 


(c)  FIGURE  6.1  FSMs  for  primitive  regular  expressions, 

If  a  is  the  regular  expression  0  U  y  and  if  both  L  (ft)  and  L  ( y )  are  regular,  then  we 
construct  My  =  (Ky,  2, 63,  s3,  Ay)  such  that  L  (My)  =  L  (a)  =  L(/3)UL  (y).  If 
necessary,  rename  the  states  of  and  M2  so  that  Kx  Pi  K2  =  0.  Create  a  new  start 
state,  sy,  and  connect  it  to  the  start  states  of  M\  and  M2  via  e-lransitions.  My  accepts 
iff  either  M]  or  M2  accepts.  So  My  =  ({s3}  U  U  K2,  2,  S3,  .v3,  A\  U  A2)%  where 
8y  =  5,  U  52  U  {((%  e),  s,),  (($3,  e),  s2)\. 

If  a  is  the  regular  expression  fiy  and  if  both  L  ( / 3 )  and  L  (y)  are  regular,  then  we 
construct  My  =  (K3.  5),  53,s3,  Ay)  such  that  L  (My)  =  L  (a)  *  L  ((3)L  (y).  If 
necessary,  rename  the  states  of  M\  and  M2  so  that  K\C\K2  =  0.  We  will  build 
My  by  connecting  every  accepting  state  of  M\  to  the  start  state  of  M2  via  an 
e-transition.  My  will  start  in  the  start  state  of  M|  and  will  accept  iff  M2  does.  So 
My  =  (K,  U  K2,  2, 8y ,  si,  A2),  where  63  =  5,  U  82  U  { ((q.  e),  ,r2) :  q  e  A\ }. 

If  a  is  the  regular  expression  (3*  and  if  L  (/ 3)  is  regular,  then  we  construct 
M2  =  (K2,  2, 82,  s2,A2)  such  that  L  (M2)  =  L  (a)  =  L  ((3)m.  We  will  create  a 
new  start  state  s2  and  make  it  accepting,  thus  assuring  that  M2  accepts  e.  (We 
need  a  new  start  state  because  it  is  possible  that  .v(,  the  start  state  of  is  not 
an  accepting  state.  If  it  isn't  and  if  it  is  reachable  via  any  input  string  other  than 
e,  then  simply  making  it  an  accepting  state  would  cause  M2  to  accept  strings 
that  are  not  in  (L  (M())*.)  We  link  the  new  s2  to  via  an  e-transitions.  Final¬ 
ly,  we  create  e-transitions  from  each  of  M\'s  accepting  states  back  to  Sj.  So 
M2  =  ({j2}  UK,,  I,82,s2 ,  {s2}UA).  where  s2  =  &\  U  {((a>b),4|)}  U  {((</, 
e),  Si):qeAi}. 


Notice  that  the  machines  that  these  constructions  build  are  typically  highly 
nondeterministic  because  of  their  use  of  e-transitions.  They  also  typically  have  a 
large  number  of  unnecessary  states.  But,  as  a  practical  matter,  that  is  not  a  prob¬ 
lem  since,  given  an  arbitrary  NDFSM  Af.wc  have  an  algorithm  that  can  construct 
an  equivalent  DFSM  M  We  also  have  an  algorithm  that  can  minimize  M 

Based  on  the  constructions  that  have  just  been  described,  we  can  define  the 
following  algorithm  to  construct,  giver  a  regular  expression  a.  a  corresponding 
(usually  nondeterministic)  FSM: 
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regextofsm(a:  regular  expression)  = 

Beginning  with  the  primitive  subexpressions  of  a  and  working  outwards 
until  an  FSM  for  all  of  a  has  been  built  do: 

Construct  an  FSM  as  described  above. 


The  fact  that  regular  expressions  can  be  transformed  into  executable  finite 
state  machines  is  important.  It  means  that  people  can  specify  programs  as 
regular  expressions  and  then  have  those  expressions  “compiled”  into  effi¬ 
cient  processes.  For  example,  hierarchically  structured  regular  expressions, 
with  the  same  formal  power  as  the  regular  expressions  we  have  been  work¬ 
ing  with,  can  be  used  to  describe  a  lightweight  parser  for  analyzing  legacy 
software.  (H.4.1) 


EXAMPLE  6.5  Building  an  FSM  from  a  Regular  Expression 

Consider  the  regular  expression  (b  U  ab)*.  We  use  regextofsm  to  build  an  FSM 
that  accepts  the  language  defined  by  this  regular  expression: 


An  FSM  for  b  An  FSM  for  a 

Y~v>  jH) 


An  FSM  for  b 
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EXAMPLE  6.5  (Continued) 

An  FSM  for  (b  U  ab)* 


6.2.2  Building  a  Regular  Expression  from  an  FSM 

Next  we  must  show  that  it  is  possible  to  go  the  other  direction,  namely  to  build,  from  an 
FSM.  a  corresponding  regular  expression.  The  idea  behind  the  algorithm  that  we  are 
about  to  present  is  the  following:  Instead  of  limiting  the  labels  on  the  transitions  of  an 
FSM  to  a  single  character  or  e.  we  will  allow  entire  regular  expressions  as  labels.  The 
goal  of  the  algorithm  is  to  construct,  from  an  input  FSM  Af.  an  output  machine  M'  such 
that  M  and  M‘  are  equivalent  and  M'  has  only  two  states,  a  start  stale  and  a  single  ac¬ 
cepting  state.  It  will  also  have  just  one  transition,  which  will  go  from  its  start  stale  to  its 
accepting  state. The  label  on  that  transition  will  be  a  regular  expression  that  describes 
all  the  strings  that  could  have  driven  the  original  machine  M  from  its  start  state  to 
some  accepting  state. 


EXAMPLE  6.6  Building  an  Equivalent  Machine  M 
Let  M  be: 


We  can  build  an  equivalent  machine  M '  by  ripping  out  q2  and  replacing  it  by  a 
transition  from  <71  to  </3  labeled  with  the  regular  expression  ab  *a.  So  M'  is: 
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Given  an  arbitrary  FSM  M,  M'  will  be  built  by  starting  with  M  and  then  removing, 
one  at  a  time,  all  the  states  that  lie  in  between  the  start  state  and  an  accepting  state.  As 
each  such  state  is  removed,  the  remaining  transitions  will  be  modified  so  that  the  set  of 
strings  that  can  drive  M'  from  its  start  state  to  some  accepting  state  remains  unchanged. 

The  following  algorithm  creates  a  regular  expression  that  defines  L(M),  provided 
that  step  6  can  be  executed  correctly: 

fsmtoregexheuristic(M:  FSM)  = 

1.  Remove  from  M  any  states  that  are  unreachable  from  the  start  state. 

2.  If  M  has  no  accepting  states  then  halt  and  return  the  simple  regular  expression  0. 

3.  If  the  start  state  of  M  is  part  of  a  loop  (i.e.,  it  has  any  transitions  coming  into  it), 
create  a  new  start  state  s  and  connect  5  to  M's  start  state  via  an  e-transition.  This 
new  start  state  s  will  have  no  transitions  into  it. 

4.  If  there  is  more  than  one  accepting  state  of  M  or  if  there  is  just  one  but  there  are 
any  transitions  out  of  it,  create  a  new  accepting  state  and  connect  each  of  M’s  ac¬ 
cepting  states  to  it  via  an  e-transition.  Remove  the  old  accepting  states  from  the 
set  of  accepting  states.  Note  that  the  new  accepting  state  will  have  no  transitions 
out  from  it. 

5.  If,  at  this  point,  M  has  only  one  state,  then  that  state  is  both  the  start  state  and  the 
accepting  state  and  M  has  no  transitions.  So  L  (M  )  =  {e}.  Halt  and  return  the 
simple  regular  expression  e. 

6.  Until  only  the  start  state  and  the  accepting  state  remain  do: 

6.1.  Select  some  state  rip  of  M.  Any  state  except  the  start  state  or  the  accepting 
state  may  be  chosen. 

6.2.  Remove  rip  from  M. 

6.3.  Modify  the  transitions  among  the  remaining  states  so  that  M  accepts  the 
same  strings.  The  labels  on  the  rewritten  transitions  may  be  any  regular 
expression. 

7.  Return  the  regular  expression  that  labels  the  one  remaining  transition  from  the 
start  state  to  the  accepting  state. 


EXAMPLE  6.7  Building  a  Regular  Expression  from  an  FSM 

Let  M  be: 
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EXAMPLE  6.7  (Continued) 

Create  a  new  start  state  and  a  new  accepting  state  and  link  them  to  M\ 


Remove  state  3: 


Remove  state  2: 


Remove  state  1: 


(abuaaa*b)*(aue) 


5 
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EXAMPLE  6.8  A  Simple  FSM  With  No  Simple  Regular  Expression 

Lei  M  be  the  FSM  thal  we  built  in  Example  5.9  for  the  language  L  =  {we  {a, 
b}* :  w  contains  an  even  number  of  as  and  an  odd  number  of  b's}.  M  is: 


Try  to  apply  fsmtoregexheuristic  to  M.  It  will  not  be  easy  because  it  is  not  at  all 
obvious  how  to  implement  step  6.3.  For  example,  if  we  attempt  to  remove  state 
[2|,  this  changes  not  just  the  way  that  M  can  move  from  state  (1]  to  state  [4].  It  also 
changes,  for  example,  the  way  that  M  can  move  from  state  [1  j  to  state  [3]  because 
it  changes  how  M  can  move  from  state  [  1  ]  back  to  itself. 


To  prove  that  for  every  FSM  there  exists  a  corresponding  regular  expression  will  re¬ 
quire  a  construction  in  w'hieh  wc  make  clearer  what  must  be  done  each  time  a  state  is 
removed  and  replaced  by  a  regular  expression. The  algorithm  that  we  are  about  to  de¬ 
scribe  has  thai  property,  although  it  comes  at  the  expense  of  simplicity  in  easy  cases 
such  as  the  one  in  Example  6.7. 

THEOREM  6.2  For  Every  FSM  There  is  an  Equivalent  Regular  Expression 

Theorem:  Every  regular  language  (i.e..  every  language  that  can  be  accepted  by  some 
FSM)  can  be  defined  with  a  regular  expression. 

Proof: The  proof  is  by  construction.  Given  an  FSM  M  =  ( K ,  E,  5,s,  A),  we  can  con¬ 
struct  a  regular  expression  c*  such  that  L(M  )  =  L  (a). 

As  we  did  in  fsmtoregexhvuristic ,  we  will  begin  by  assuring  that  M  has  no  un¬ 
reachable  slates  and  that  it  has  a  start  state  thal  has  no  transitions  into  it  and  a 
single  accepting  state  that  has  no  transitions  out  from  it.  But  now  we  will  make  a 
further  important  modification  to  M  before  we  start  removing  states:  From  every 
slate  other  than  the  accepting  state  there  must  be  exactly  one  transition  to  every 
stale  (including  itself)  except  the  start  stale.  And  into  every  state  other  than  the 
start  state  there  must  be  exactly  one  transition  from  every  state  (including  itself) 
except  the  accepting  state.To  make  this  true,  we  do  two  things: 

•  If  there  is  more  than  one  transition  between  stales  p  and  r/,  collapse  them  into 
a  single  transition.  11  the  set  of  labels  on  the  original  set  of  such  transitions  is 
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FIGURE  6.2  Collapsing  multiple  transitions 
into  one. 


{c'|,  Ci . c„}.  then  delete  those  transitions  and  replace  them  by  a  single 

transition  with  the  label  t'|  U  is  U  ...  U  c„.  For  example,  consider  the  FSM 
fragment  shown  in  Figure  6.2(a).  We  must  collapse  the  two  transitions  be¬ 
tween  states  1  and  2.  After  doing  so.  we  have  the  fragment  shown  in 
Figure  6.2(b). 

•  If  any  of  the  required  transitions  are  missing,  add  them.  We  can  add  all  of 
those  transitions  without  changing  L(M)  by  labeling  all  of  the  new  transitions 
with  the  regular  expression  0.  So  there  is  no  string  that  will  allow  them  to  be 
taken.  For  example,  let  M  be  the  FSM  shown  in  Figure  6.3(a).  Several  new 
transitions  are  required.  When  we  add  them,  we  have  the  new  FSM  shown  in 
Figure  6.3(b). 


0 


(h) 


FIGURE  6.3  Adding  all  the  required 
transitions 
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Now  suppose  that  we  select  a  state  rip  and  remove  it  and  the  transitions  into  and 
out  of  it.  Then  we  must  modify  every  remaining  transition  so  that  M's  function  stays 
the  same.  Since  M  already  contains  a  transition  between  each  pair  of  states  (except 
the  ones  that  are  not  allowed  into  and  out  of  the  start  and  accepting  states),  if  all 
those  transitions  are  modified  correctly  then  M's  behavior  will  be  correct. 

So,  suppose  that  we  remove  some  state  that  we  will  call  rip.  How  should  the  re¬ 
maining  transitions  be  changed?  Consider  any  pair  of  states  p  and  q.  Once  we  re¬ 
move  rip,  how  can  M  get  from  p  to  ql 

•  It  can  still  take  the  transition  that  went  directly  from  p  to  q,  or 

•  It  can  take  the  transition  from  p  to  rip. Then,  it  can  take  the  transition  from  rip 
back  to  itself  zero  or  more  times.  Then  it  can  take  the  transition  from  rip  to  q. 


Let  R(p,  q)  be  the  regular  expression  that  labels  the  transition  in  M  from  p  to 
q. Then,  in  the  new  machine  M‘  that  will  be  created  by  removing  rip,  the  new  reg¬ 
ular  expression  that  should  label  the  transition  from  p  to  q  is: 


*(/>.< 7) 

R(p,  rip) 
R(rip,  rip)* 
R(rip,q) 


I*  Go  directly  from  p  to  q, 

U  /*  or 

/*  go  from  p  to  rip,  then 

/"■go  from  rip  back  to  itself  any  number  of  times,  then 
/*  go  from  rip  to  q. 


We'll  denote  this  new  regular  expression  R’(p,  q).  Writing  it  out  without  the 
comments,  we  have: 

R'  =  R{p,  q)  U  R(p,  rip)R{rip,  rip)*R(rip,  q). 


EXAMPLE  6.9  Ripping  States  Out  One  at  a  Time 

Again,  let  M  be: 


Let  rip  be  state  2.  Then: 

*'(1,3)  =  *(1, 3)  U  R(\,  rip)R{rip,  rip)*R{rip,  3). 
=  *(1, 3)  U  R(i,2)R(2, 2)*R(2, 3). 
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EXAMPLE  6.9  (Continued) 

=  0  U  a  b*  a. 

»  ab*a. 

Notice  that  ripping  state  2  also  changes  another  way  the  original  machine  had 
to  get  from  state  1  to  state  3:  It  could  have  gone  from  slate  1  to  slate  4  to  state  2 
and  then  to  state  3.  But  we  don’t  have  to  worry  about  that  in  computing  R'  (1,3). 
The  required  change  to  that  path  will  occur  when  we  compute  R'  (4. 3). 


When  all  slates  except  the  start  state  s  and  the  accepting  slate  a  have  been  re¬ 
moved,  R(s,  a)  will  describe  the  set  of  strings  that  can  drive  M  from  its  start  state 
to  its  accepting  state.  So  R(s,  a)  will  describe 

We  can  now  define  an  algorithm  to  build,  from  any  FSM  Af  =  (K.'S.,8,s,A), 
a  regular  expression  that  describes  L(M).  We'll  use  two  subroutines  .standardize* 
which  will  convert  Af  to  the  required  form,  and  buildrcgex,  w  hich  will  construct, 
from  the  modified  machine  A/,  the  required  regular  expression. 

slundardize(M\  FSM)  = 

1.  Remove  from  Af  any  stales  that  are  unreachable  from  the  start  slate. 

2.  If  the  start  state  of  Af  is  part  of  a  loop  (i.e..  it  has  any  transitions  coming  into  it), 
create  a  new  start  stale  s  and  connect  s  to  A/’s  start  slate  via  an  e-transition. 

3.  If  there  is  more  than  one  accepting  state  of  Af  or  if  there  is  just  one  but 
there  are  any  transitions  out  of  it.  create  a  new  accepting  state  and  connect 
each  of  A/’s  accepting  states  to  it  via  an  e-transilion.  Remove  the  old  accept¬ 
ing  states  from  the  set  of  accepting  stales. 

4.  If  there  is  more  than  one  transition  between  stales  />  and  q,  collapse  them 
into  a  single  transition. 

5.  If  there  is  a  pair  of  states  p.  q  and  there  is  no  transition  between  them  and 
p  is  not  the  accepting  state  and  q  is  not  the  start  stale,  then  create  a  transi¬ 
tion  from  p  to  q  labeled  0. 

buildregex(M:  FSM)  = 

1.  If  Af  has  no  accepting  states,  then  hall  and  return  the  simple  regular  ex¬ 
pression  0. 

2.  If  Af  has  only  one  stale,  then  halt  and  return  the  simple  regular  expression  e. 

3.  Until  only  the  start  slate  and  the  accepting  state  remain  do: 

3.1.  Select  some  state  rip  of  Af.  Any  state  except  the  start  state  or  the  ac¬ 
cepting  state  may  be  chosen. 
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i  3.2.  For  every  transition  from  some  state  p  to  some  state  <7,  if  both  p  and  q  are 

not  rip  then,  using  the  current  labels  given  by  the  expressions  R,  compute 
the  new  label  R '  for  the  transition  from  p  to  q  using  the  formula: 

R'(p,  q)  =  R(p,  q)  U  R(p,  rip)R(rip,  rip)*R(rip,  q). 

3.3.  Remove  rip  and  all  transitions  into  and  out  of  it. 

4.  Return  the  regular  expression  that  labels  the  one  remaining  transition 
from  the  start  state  to  the  accepting  state. 

We  can  show  that  the  new  FSM  that  is  built  by  standardize  is  equivalent  to  the 
original  machine  (i.e.,  that  they  accept  the  same  language)  by  showing  that  the 
language  that  is  accepted  is  preserved  at  each  step  of  the  procedure.  We  can  show 
that  buildregex(M)  builds  a  regular  expression  that  correctly  defines  L(M)  by  in¬ 
duction  on  the  number  of  states  that  must  be  removed  before  it  halts.  Using  those 
two  procedures,  we  can  now  define: 

fsmtoregex{M:  FSM)  = 

1.  M'  =  standardize  ( M ). 

2.  Return  buildregex(M’). 

6.2.3  The  Equivalence  of  Regular  Expressions  and  FSMs 

The  last  two  theorems  enable  us  to  prove  the  next  one,  due  to  Stephen  Kleene  S. 

THEOREM  6.3  Kleene's  Theorem 

Theorem:  The  class  of  languages  that  can  be  defined  with  regular  expressions  is  ex¬ 
actly  the  class  of  regular  languages. 

Proof:  Theorem  6.1  says  that  every  language  that  can  be  defined  with  a  regular  ex¬ 
pression  is  regular.  Theorem  6.2  says  that  every  regular  language  can  be  defined 
by  some  regular  expression. 


6.2.4  Kleene's  Theorem,  Regular  Expressions,  and  Finite  State 
Machines 

Kleene’s  Theorem  tells  us  that  there  is  no  difference  between  the  formal  power  of  regular 
expressions  and  finite  state  machines.  But,  as  some  of  the  examples  that  we  just  consid¬ 
ered  suggest,  there  is  a  practical  difference  in  their  effectiveness  as  problem  solving  tools: 


•  As  we  said  in  the  introduction  to  this  chapter,  the  regular  expression  language  is  a 
pattern  language.  In  particular,  regular  expressions  must  specify  the  order  in  which 
a  sequence  of  symbols  must  occur. This  is  useful  when  we  want  to  describe  patterns 
such  as  phone  numbers  (it  matters  that  the  area  code  comes  first)  or  email  address¬ 
es  (it  matters  that  the  user  name  comes  before  the  domain). 
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•  Bui  there  are  some  applications  where  order  doesn’t  matter.  The  vending  machine 
example  that  we  considered  at  the  beginning  of  Chapter  5  is  an  instance  of  this  class 
of  problem. The  order  in  which  the  coins  were  entered  doesn't  matter.  Parity  check¬ 
ing  is  another.  Only  the  total  number  of  1  bits  matters,  not  where  they  occur  in  the 
string.  Finite  state  machines  can  be  very  effective  in  solving  problems  such  as  this. 
But  the  regular  expressions  that  correspond  to  those  FSMs  may  be  too  complex  to 
be  useful. 

The  bottom  line  is  that  sometimes  it  is  easy  to  write  a  finite  stale  machine  to  de¬ 
scribe  a  language.  For  other  problems,  it  may  be  easier  to  write  a  regular  expression. 

Sometimes  Writing  Regular  Expressions  is  Easy 

Because,  for  some  problems,  regular  expressions  are  easy  to  write.  Kleene's  theorem  is 
useful.  It  gives  us  a  second  way  to  show  that  a  language  is  regular.  We  need  only  show 
a  regular  expression  that  defines  it. 


EXAMPLE  6.10  No  More  Than  One  b 

Let  L  =  {tee  {a,  b}* :  there  is  no  more  than  one  b}.  L  is  regular  because  it  can 
be  described  with  the  following  regular  expression: 

a*  (bUe)  a*. 


EXAMPLE  6.11  No  Two  Consecutive  Letters  are  the  Same 

Let  L  =  {tee  {a.  b}*  :  no  two  consecutive  letters  are  the  same}.  L  is  regular  be¬ 
cause  it  can  be  described  with  either  of  the  following  regular  expressions: 

(b  U  e)  (ab)*  (a  U  e). 

(aUe)(ba)*(bUe). 


EXAMPLE  6.12  Floating  Point  Numbers 

Consider  again  FLOAT,  the  language  of  floating  point  numbers  that  we  described 
in  Example  5.7.  Kleene's  Theorem  tells  us  that,  since  FLOAT  is  regular,  there 
must  be  some  regular  expression  that  describes  it.  In  fact,  regular  expressions  can 
be  used  easily  to  describe  languages  like  FLOAT.  We  ll  use  one  shorthand.  Let: 

D  stand  for(0  U  1  U  2  U  3  U  4  U  5  U  6  U  7  U  8  U9). 

Then  FLOAT  is  the  language  described  by  the  following  regular  expression: 
(eU  +  U  -)D*  (eU  .D+)(eU  (E(eU  +  U  -)D‘). 
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It  is  useful  to  think  of  programs,  queries,  and  other  strings  in  practical 
languages  as  being  composed  of  a  sequence  of  tokens,  where  a  token  is 
the  smallest  string  that  has  meaning.  So  variable  and  function  names, 
numbers  and  other  constants,  operators,  and  reserved  words  are  all  to¬ 
kens.  The  regular  expression  we  just  wrote  for  the  language  FLOAT  de¬ 
scribes  one  kind  of  token.  The  first  thing  a  compiler  does,  after  reading 
its  input,  is  to  divide  it  into  tokens.  That  process  is  called  lexical  analysis. 
It  is  common  to  use  regular  expressions  to  define  the  behavior  of  a  lexi¬ 
cal  analyzer.  (G.4.1) 


Sometimes  Building  a  Deterministic  FSM  is  Easy 

Given  an  arbitrary  regular  expression,  the  general  algorithms  presented  in  the  proof  of 
Theorem  6.1  will  typically  construct  a  highly  nondeterministic  FSM.  But  there  is  a  use¬ 
ful  special  case  in  which  it  is  possible  to  construct  a  DFSM  directly  from  a  set  of  pat¬ 
terns.  Suppose  that  we  are  given  a  set  K  of  n  keywords  and  a  text  string  s.  We  want  to 
find  occurrences  in  s  of  the  keywords  in  K.  We  can  think  of  K  as  defining  a  language 
that  can  be  described  by  a  regular  expression  of  the  form: 

(S^,ulu...uot+. 

In  other  words,  we  will  accept  any  string  in  which  at  least  one  keyword  occurs.  For 
some  applications  this  will  be  good  enough.  For  others,  we  may  care  which  keyword  was 
matched.  For  yet  others  we’ll  want  to  find  all  substrings  that  match  some  keyword  in  K. 


By  letting  the  keywords  correspond  to  sequences  of  amino  acids,  this  idea 
can  be  used  to  build  a  fast  search  engine  for  protein  databases.  (K.3) 


In  any  of  these  special  cases,  we  can  build  a  deterministic  FSM  M  by  first  building  a 
decision  tree  out  of  the  set  of  keywords  and  then  adding  arcs  as  necessary  to  tell  M 
what  to  do  when  it  reaches  a  dead  end  branch  of  the  tree.  The  following  algorithm 
builds  an  FSM  that  accepts  any  string  that  contains  at  least  one  of  the  specified  key¬ 
words: 

huiUlkeywortlFSM(K:  set  of  keywords)  = 

1.  Create  a  start  state  r/0. 

2.  For  each  element  k  of  K  do: 

Create  a  branch  corresponding  to  k. 

3.  Create  a  set  of  transitions  that  describe  what  to  do  when  a  branch  dies,  either  be¬ 
cause  its  complete  pattern  has  been  found  or  because  the  next  character  is  not 
the  correct  one  to  continue  the  pattern. 

4.  Make  the  states  at  the  ends  of  each  branch  accepting. 


EXAMPLE  6.13  Recognizing  a  Set  of  Keywords 

Consider  the  set  of  keywords  {cat.  bat.  cab}.  We  can  use  built! keyword FS M  to 
build  a  DFSM  to  accept  strings  that  contain  at  least  one  of  these  keywords.  We 
begin  by  creating  a  start  state  and  then  a  path  to  accept  the  first  keyword,  cat: 


-{c> 


Next  we  add  branches  for  the  remaining  keywords,  bat  and  cab: 


->{c,b,a} 


Finally,  we  add  transitions  that  let  the  machine  recover  after  a  path  dies: 


14fi 
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6.3  Applications  of  Regular  Expressions 

Patterns  are  everywhere. 


Regular  expressions  can  be  matched  against  the  subject  fields  of  emails  to 
find  at  least  some  of  the  ones  that  are  likely  to  be  spam.  (0.1) 


Because  patterns  are  everywhere,  applications  of  regular  expressions  are  everywhere. 
Before  we  look  at  some  specific  examples,  one  important  caveat  is  required:  The  term 
regular  expression  is  used  in  the  modern  computing  world  in  a  much  more  general 
way  than  we  have  defined  it  here.  Many  programming  languages  and  scripting  systems 
provide  support  for  regular  expression  matching.  Each  of  them  has  its  own  syntax. 
They  all  have  the  basic  operators  union. concatenation,  and  Kleene  star. They  typically 
have  others  as  well.  Many,  for  example,  have  a  substitution  operator  so  that,  after  a  pat¬ 
tern  is  successfully  matched  against  a  string,  a  new  string  can  be  produced  In  many 
cases,  these  other  operators  provide  enough  additional  power  that  languages  that  are 
not  regular  can  be  described.  So.  in  discussing  “regular  expressions”  or  “regexes”,  it  is 
important  to  be  clear  exactly  what  definition  is  being  used.  In  the  rest  of  this  book,  we 
will  use  the  definition  that  we  presented  in  Section  6.1,  with  two  additions  to  be  de¬ 
scribed  below,  unless  we  clearly  state  that,  for  some  particular  purpose,  we  are  going  to 
use  a  different  definition. 


The  programming  language  Perl,  for  example,  supports  regular  expression 
matching.  (Appendix  O)  In  Exercise  6.19.  we'll  consider  the  formal  power  of 
the  Perl  regular  expression  language. 


Real  applications  need  more  than  two  ot  three  characters.  But  we  do  not  want  to 
have  to  write  expressions  like: 

(aUbUcUdUeUfUgUhUi  UjUkUl  UmUnUoUpUqU 
rUsUtUuUvUwUxUyUz). 

It  would  be  much  more  convenient  to  be  able  to  write  (a-z).  So.  in  cases  where  there  is 
an  agreed  upon  collating  sequence,  we  will  use  the  shorthand  (a  -  u >)  to  mean 
(a  U  ...  U  to),  where  all  the  characters  in  the  collating  sequence  between  a  and  to  are 
included  in  the  union. 


EXAMPLE  6.14  Decimal  Numbers 

The  following  regular  expression  matches  decimal  encodings  of  numbers: 

-?  C[0-9]+C\. [0-9]*)?  |  \. [0-9]*) 
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EXAMPLE  6.14  (Continued) 

In  most  standard  regular  expression  dialects,  the  notation  or?  is  equivalent 
to  (a  U  e).  In  other  words,  a  is  optional.  So,  in  this  example,  the  minus  sign  is 
optional.  So  is  the  decimal  point. 

Because  the  symbol .  has  a  special  meaning  in  most  regular  expression  dialects, 
we  must  quote  it  when  we  want  to  match  it  as  a  literal  character.  The  quote  char¬ 
acter  in  most  regular  expression  dialects  is  \. 


Meaningful  "words”  in  protein  sequences  arc  called  motifs.  They  can  be  de¬ 
scribed  with  regular  expressions.  ( K.3.2) 


EXAMPLE  6.15  Legal  Passwords 

Consider  the  problem  of  determining  whether  a  string  is  a  legal  password.  Sup¬ 
pose  that  we  require  that  all  passwords  meet  the  following  requirements; 

•  A  password  must  begin  with  a  letter. 

•  A  password  may  contain  only  letters,  numbers,  and  the  underscore  character. 

•  A  password  must  contain  at  least  four  characters  and  no  more  than  eight  char¬ 
acters. 

The  following  regular  expression  describes  the  language  of  legal  passwords. 
The  line  breaks  have  no  significance.  We  have  used  them  just  to  make  the  expres¬ 
sion  easier  to  read. 


((a-z) 

U 

(A-Z)) 

((a-z) 

U 

(A-Z) 

U 

(0-9) 

U. 

-) 

((a-z) 

u 

(A-Z) 

U 

(0-9) 

U. 

.) 

((a-z) 

u 

(A-Z) 

U 

(0-9) 

u 

-) 

((a-z) 

u 

(A-Z) 

U 

(0-9) 

u 

_u 

«) 

((a  z) 

u 

(A-Z) 

U 

(0-9) 

u 

_u 

e) 

((a-z) 

u 

(A-Z) 

U 

(0-9) 

u 

_u 

e) 

((a-z) 

u 

(A-Z) 

U 

(0-9) 

u 

_  U  e). 

While  straightforward,  the  regular  expression  that  we  just  wrote  is  a  nuisance  to 
write  and  not  very  easy  to  read.  The  problem  is  that,  so  far.  we  have  only  three  ways  to 
specify  how  many  times  a  pattern  must  occur: 
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•  a  means  that  the  pattern  a  must  occur  exactly  once. 

•  a*  means  that  the  pattern  a  may  occur  any  number  (including  zero)  of  times. 

•  a+  means  that  the  pattern  a  may  occur  any  positive  number  of  times. 

What  we  needed  in  the  previous  example  was  a  way  to  specify  how  many  times  a 
pattern  a  should  occur.  We  can  do  this  with  the  following  notations: 

•  a{n.m  \  means  that  the  pattern  a  must  occur  at  least  n  times  and  no  more  than  m  limes. 

•  a { n )  means  that  the  pattern  a  must  occur  exactly  n  times. 

Using  this  notation,  we  can  rewrite  the  regular  expression  of  Example  6.15  as: 
((a-z)  U  (A-Z))  ((a-z)  U  (A-Z)  U  (0-9)  U_){3,7}. 

EXAMPLE  6.16  IP  Addresses 

The  following  regular  expression  searches  for  Internet  (IP)  addresses: 

CC0-9H1,  3}  (\.  [0-9] {1,  3EH3}). 


In  XML.  regular  expressions  are  one  way  to  define  parts  of  new  document 
types.  (Q.  1.2) 


6.4  Manipulating  and  Simplifying  Regular  Expressions 

The  regular  expressions  (a  U  b)*  (a  U  b)*and(a  U  b)*  define  the  same  language.The 
second  one  is  simpler  than  the  first  and  thus  easier  to  work  with.  In  this  section  we  dis¬ 
cuss  techniques  for  manipulating  and  simplifying  regular  expressions.  All  of  these  tech¬ 
niques  are  based  on  the  equivalence  of  the  languages  that  the  regular  expressions  define. 
So  we  will  say  that,  for  two  regular  expressions  a  and  /3,  a  =  /3  if  L  (a)  =  L  (|3). 

We  first  consider  identities  that  follow  from  the  fact  that  the  meaning  of  every  regu¬ 
lar  expression  is  a  language,  which  means  that  it  is  a  set: 

•  Union  is  commutative:  For  any  regular  expressions  a  and  (3,  a  U  /3  =  f3Ua. 

•  Union  is  associative:  For  any  regular  expressions  a.  (3,  and  y,  (a  U  /3)  U  y  =  a 
U(/3Uy). 

•  0  is  the  identity  for  union:  For  any  regular  expression  a.  aU0  =  0Ua  =  a. 

•  Union  is  idempotent:  For  any  regular  expression  a,aUa  =  a. 

•  Given  any  two  sets  A  and  S.  if  BCA.  then  A  U  B  =  A.  So,  for  example,  a*  U  aa 

=  a*,  since  L(aa)  C  L(a*). 

Next  we  consider  identities  involving  concatenation: 

•  Concatenation  is  associative:  For  any  regular  expressions  a,  /3.andy.  ( a/3)y  =  a  (J3y). 
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•  e  is  the  identity  for  concatenation:  For  any  regular  expression  a.  a  e  =  e  a  -  a. 

•  0  is  a  zero  for  concatenation:  For  any  regular  expression  a.  a  0  =  0  a  =  0. 

Concatenation  distributes  over  union: 

•  For  any  regular  expressions  a,  p,  and  y,  (a  U  p)y  =  (oy)  U  (/3y).  Every  siring  in 
either  of  these  languages  is  composed  of  a  first  part  followed  by  a  second  part. The 
first  part  must  be  drawn  from  L  (a)  or  L  (fi).  The  second  part  must  be  drawn  from 
L  (y). 

•  For  any  regular  expressions  a.  p,  and  y.  y  {«  U  /3)  =  (y«)  U  (y/3).  (By  a  similar 
argument.) 

Finally,  we  introduce  identities  involving  Kleene  star: 

•  0*  =  e. 

•  e*  =  e. 

•  For  any  regular  expression  a,  (a*)*  =  a*.  L  (a*)  contains  all  and  only  the  strings 
that  are  composed  of  zero  or  more  strings  from  L  (a),  concatenated  together.  All  of 
them  are  also  in  L  ((a*)*)  since  L  ((a*)*)  contains,  among  other  things,  every  indi¬ 
vidual  string  in  L  (o*).  No  other  strings  are  in  L  ((a*)*)  since  it  can  contain  only 
strings  that  are  formed  from  concatenating  together  elements  of  L  (a*),  which  are 
in  turn  concatenations  of  strings  from  L  (a). 

•  For  any  regular  expression  a,  a*a*  =  a*.  Every  siring  in  either  of  these  languages 
is  composed  of  zero  or  more  strings  from  a  concatenated  together. 

•  More  generally,  for  any  regular  expressions  a  and  /3,  if  L[a*)Q  L  (p*)  then  a*P*  = 
p*.  For  example: 

a*  (a  U  b)*  =  (a  U  b)*.  since  L(a*)  C  JL(( a  U  b)*). 

a  is  redundant  because  any  string  it  can  generate  and  place  at  the  beginning  of  a  string 
to  be  generated  by  the  combined  expression  a*p*  can  also  be  generated  by  p*. 

•  Similarly,  if  L  (j3*)  C  L  («*)  then  ct*p*  =  a*. 

•  For  any  regular  expressions  a  and  /3,  («Uj3)*  =  (a*/3*)*.  To  form  a  string  in 
either  language,  a  generator  must  walk  through  the  Kleene  star  loop  zero  or 
more  times.  Using  the  first  expression,  each  time  through  the  loop  it  chooses 
either  a  string  from  L  (a)  or  a  string  from  L  (p).  That  process  can  be  copied 
using  the  second  expression  by  picking  exactly  one  string  from  L  (rx)  and  then 
e  from  L  (P)  or  one  string  from  L  (p)  and  then  e  from  L  («).  Using  the  second 
expression,  a  generator  can  pick  a  sequence  of  strings  from  /.  (u)  and  then  a  se¬ 
quence  of  strings  from  L  (P)  each  time  through  the  loop.  But  that  process  can 
be  copied  using  the  first  expression  by  simply  selecting  each  element  of  the  se¬ 
quence  one  at  a  time  on  successive  times  through  the  loop. 

•  For  any  regular  expressions  a  and  p .  if  L  (p)  C  L  (a*)  then  (a  U  P)*  =  a*.  For  ex¬ 
ample,  (a  Ue)*  =  a*,  since  {e)  C  L(a*).  p  is  redundant  since  any  string  it  can  gen¬ 
erate  can  also  be  generated  by  a *. 
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EXAMPLE  6.17  Simplifying  a  Regular  Expression 


((a*  U  0)*  U  aa)  (b  U  bb)*  b*  ((a  U  b)*  b*  U  ab)  *  = 

((a*)*  U  aa)  (b  U  bb)*  b*  ((a  U  b)*  b*  U  ab)  *  = 

(a*  U  aa)  (b  U  bb)*  b*  ((a  U  b)*  b*  U  ab)  *  = 

a*  (b  U  bb)*b*  ((a  U  b)*  b*  U  ab)  *  = 


b* 

b*  ((a 

U 

b)* 

b  *  U 

ab) 

b* 

((a 

U 

b)* 

b*  U 

ab) 

b* 

((a 

U 

b)* 

U 

ab) 

b* 

((a 

U 

b)* 

) 

b* 

(a 

U 

b)* 

(a 

U 

b)* 

(a 

U 

b)* 

/*  L(0)  C  L(a*). 

/*  L(aa)  C  L( a*). 

/*  L(bb)  £  L{ b*). 

/*L(b*)C  L(( a  U  b)*). 
/*  L(ab)  C  L(( a  U  b)*). 

/*  L(b*)  C  L(( a  U  b)*). 
/*  L( a*)  C  L(( a  U  b)*). 


Exercises 

1.  Describe  in  English,  as  briefly  as  possible,  ihe  language  defined  by  each  of  these 
regular  expressions: 

a.  (b  U  ba)  (b  U  a)*  (ab  U  b). 

b.  (((a*b*)*ab)  U  ((a*b*)*ba))(b  U  a)*. 

2.  Write  a  regular  expressions  to  describe  each  of  the  following  languages: 

a.  {we  {a,  b}*  :  every  a  in  w  is  immediately  preceded  and  followed  by  b}. 

b.  {we  {a,  b}*  :  w  does  not  end  in  ba). 

c.  {we  {0,1}*  :  3y  e  {0,1}*  (|x>»|  is  even)}. 

d.  {we  {0, 1}*  :  w  corresponds  to  the  binary  encoding,  without  leading  Os, of 
natural  numbers  that  are  evenly  divisible  by  4}. 

e.  {we  {0, 1}*  :  w  corresponds  to  the  binary  encoding,  without  leading  Os,  of 
natural  numbers  that  are  powers  of  4}. 

f.  { w  e  { 0-9 }  *  :  w  corresponds  to  the  decimal  encoding,  without  leading  0s,  of 
an  odd  natural  number}. 

g.  {we  {0, 1}*  :  w  has  001  as  a  substring}. 

h.  {w  e  {0. 1}*  :  w  does  not  have  001  as  a  substring}. 

i.  {we  {a,  b}*  :  w  has  bba  as  a  substring}. 

j.  {we  {a,  b}*  :  w  has  both  aa  and  bb  as  substrings}. 

k.  {we  {a,b}*  :  w  has  both  aa  and  aba  as  substrings}. 

l.  {w  e  {a,  b}*  :  w  contains  at  least  two  b’s  that  are  not  followed  by  an  a}. 

m.  {we  {0. 1}*  :  w  has  at  most  one  pair  of  consecutive  0s  and  at  most  one  pair 
of  consecutive  Is}. 


152  Chapter  6  Regular  Expressions 


n.  {we  {0, 1}*  :  none  of  the  prefixes  of  w  ends  in  0}. 

o.  {we{a,b}*  :#a(w)  ®30}. 

p.  {we  {a.b}* :  #a(w)  s  3}. 

q.  {we{a,b}*:w  contains  exactly  two  occurrences  of  the  substring  aa}. 

r.  {we{a,  b}*:w  contains  no  more  than  two  occurrences  of  the  substring 
aa}. 

s.  {we  {a,  b}*  -  L},  where  L=  {we  {a.b}*  :  w  contains  bba  as  a 
substring}. 

t  {we  {0, 1}* :  every  odd  length  string  in  L  begins  with  11}. 

u.  {we  {0-9}*  :  w  represents  the  decimal  encoding  of  an  odd  natural  number 
without  leading  Os. 

v.  Ly  -  Li,  where  L\  =  a*b*c*  and  L2  =  c*b*a*. 

w.  The  set  of  legal  United  States  zip  codes  9  . 

x.  The  set  of  strings  that  correspond  to  domestic  telephone  numbers  in  your 
country. 

3.  Simplify  each  of  the  following  regular  expressions: 

a.  (a  U  b)*  (a  U  e)  b*. 

b.  (0*  U  b)  b*. 

c.  (a  U  b)*a*  U  b. 

d.  ((a  U  b)*)*. 

e.  ((a  U  b)+)*. 

f.  a  ( (a  U  b)(b  U  a)  )*  U  a  ( (a  U  b)  a)*  U  a  ((b  U  a)  b)*. 

4.  For  each  of  the  following  expressions  £,  answer  the  following  three  questions 
and  prove  your  answer: 

i.  Is  £  a  regular  expression? 

ii.  If  £  is  a  regular  expression,  give  a  simpler  regular  expression. 

iii.  Does  E  describe  a  regular  language? 

a.  ((a  U  b)  U  (ab))*. 

b.  (a+  aV). 

c.  ((ab)*0). 

d.  (((ab)  U  c)*  n  (b  U  c*)). 

e.  (0*U(  bb*)). 

5.  Let  L  =  {a"b"  :0sn<4}. 

a.  Show  a  regular  expression  for  L. 

b.  Show  an  FSM  that  accepts  L. 

6.  Let£  =  { w  e  { 1, 2 }  * :  for  all  prefixes  p  of  w,  if  Ip  I  >  0  and  \p\  is  even,  then  the 
last  character  of  p  is  1 }. 

a.  Write  a  regular  expression  for  L. 

b.  Show  an  FSM  that  accepts  L. 
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7.  Use  the  algorithm  presented  in  the  proof  of  Kleene’s  Theorem  to  construct  an  FSM 
to  accept  the  language  generated  by  each  of  the  following  regular  expressions: 

a.  (b(bUe)b)*. 

b.  bab  U  a*. 

8.  Let  L  be  the  language  accepted  by  the  following  finite  state  machine: 


Indicate,  for  each  of  the  following  regular  expressions,  whether  it  correctly  de¬ 
scribes  L: 

a.  (a  U  ba)bb*a. 

b.  (eU  b)a(bb*a)*. 

c.  ba  U  ab*a. 

d.  (a  U  ba)(bb*a)*. 

9.  Consider  the  following  FSM  M: 


a.  Show  a  regular  expression  for  L(M). 

b.  Describe  L(M)  in  English. 

10.  Consider  the  FSM  M  of  Example  5.3.  Use  fsmioregexheuristic  to  construct  a  reg¬ 
ular  expression  that  describes  L(M). 

11.  Consider  the  FSM  M  of  Example  6.9.  Apply  fsmtoregex  to  M  and  show  the  regu¬ 
lar  expression  that  results. 

12.  Consider  the  FSM  M  of  Example  6.8.  Apply  fsmtoregex  to  M  and  show  the  regular 
expression  that  results.  (Hint:  This  one  is  exceedingly  tedious,  but  it  can  be  done.) 

13.  Show  a  possibly  nondeterministic  FSM  to  accept  the  language  defined  by  each  of 
the  following  regular  expressions: 

a.  (((a  U  ba)  b  U  aa)* 

b.  (b  Ue)(abHaUe). 

c.  (babb*  U  a)*. 
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d.  (ba  U  ((a  U  bb)  a*b)). 

e.  (a  U  b)*aa(b  U  aa)bb(a  U  b)*. 

14.  Show  a  DFSM  to  accept  the  language  defined  by  each  of  the  following  regular 
expressions: 

a.  (aba  U  aabaa)*. 
h.  (ab)*(aab)*. 

15.  Consider  the  following  DFSM  M: 


a.  Write  a  regular  expression  that  describes  L[M). 

b.  Show  a  DFSM  that  accepts  ->L  ( M ). 

16.  Given  the  following  DFSM  M,  write  a  regular  expression  that  describes  ->L  (A/): 


17.  Add  the  keyword  able  to  the  set  in  Example  6.13  and  show  the  FSM  that  will  be 
built  by  bitildkeywvrdh'SM  from  the  expanded  keyword  set. 

18.  Let  S  =  {a.  b}.  Let  L  =  {e.  a,  b}.  Let  R  he  a  relation  defined  on  as  follows: 
V.ry  (xRy  iff  y  =  .vb).  Let  R’  be  the  reflexive,  transitive  closure  of  R.  Let  L'  = 
{.v  :  3y  g  L  (yR'x)\.  Write  a  regular  expression  for  U. 

19.  In  Appendix  O  we  summarize  the  main  features  of  the  regular  expression  lan¬ 
guage  in  Perl.  What  feature  of  that  regular  expression  language  makes  it  possible 
to  write  regular  expressions  that  describe  languages  that  aren’t  regular? 

20.  For  each  of  the  following  statements,  slate  whether  it  is  True  or  hike.  Prove  your 
answer. 

a.  (ab)*a  =  a(ba)*. 

b.  (a  U  b)*  b  (a  U  b)*  =  a*  b  (a  U  b)*. 

c.  (a  U  b)*  b  (a  U  b)*  U  (a  U  b)*  a  (a  U  b)*  =  (a  U  b)*. 

d.  (a  U  b)*  b  (a  U  by*  U  (a  U  by*  a  (a  U  by*  =  (a  U  b)\ 

e.  (a  U  b)*  b  a  (a  U  b)*U  a*b*  =  (a  U  b)*. 

f.  a*  b  (a  U  b)*  =  (a  U  b)*  b  (a  U  b)*. 

g.  If  a  and  /3  are  any  two  regular  expressions,  then  (ir  U  /3)*  =  n  (fiat  U  «). 

h.  If  a  and  jS  are  any  two  regular  expressions,  then  (a/3  )*a  =  a  {(in)*. 


CHAPTER  7 

Regular  Grammars* 

So  far,  we  have  considered  two  equivalent  ways  to  describe  exactly  the  class  of 
regular  languages: 

•  Finite  state  machines. 

•  Regular  expressions. 

We  now  introduce  a  third: 

•  Regular  grammars  (sometimes  also  called  right  linear  grammars). 


7.1  Definition  of  a  Regular  Grammar 

A  regular  grammar  G  is  a  quadruple  (K,  2,  R,  S),  where: 

•  V  is  the  rule  alphabet,  which  contains  nonterminals  (symbols  that  are  used  in  the 
grammar  but  that  do  not  appear  in  strings  in  the  language)  and  terminals  (symbols 
that  can  appear  in  strings  generated  by  G), 

•  2  (the  set  of  terminals)  is  a  subset  of  V , 

•  R  (the  set  of  rules)  is  a  finite  set  of  rules  of  the  form  X  — *  Y,  and 

•  S  (the  start  symbol)  is  a  nonterminal. 

In  a  regular  grammar,  all  rules  in  R  must: 

•  have  a  left-hand  side  that  is  a  single  nonterminal,  and 

•  have  a  right-hand  side  that  is  e  or  a  single  terminal  or  a  single  lerminal  followed  by 
a  single  nonterminal. 
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So  S  — ♦  a.S  — ♦  e.  and  T  —*  a.V  are  legal  rules  in  a  regular  grammar.  S  — *  a.Va  and  aSa  — *  T 
are  not  legal  rules  in  a  regular  grammar. 

We  will  formalize  the  notion  of  a  grammar  generating  a  language  in  Chapter  11. 
when  we  introduce  a  more  powerful  grammatical  framework,  the  context-free  gram¬ 
mar.  For  now.  an  informal  notion  will  do.  The  language  generated  by  a  grammar 
G  =  (V,  2.  R.  S  ).  denoted  L( G),  is  the  set  of  all  strings  u  in  1*  such  that  it  is  possible 
to  start  with  S.  apply  some  finite  set  of  rules  in  R.  and  derive  ic. 

To  make  writing  grammars  easy,  we  will  adopt  the  convention  that,  unless  otherwise 
specified,  the  start  symbol  of  any  grammar  G  w  ill  be  the  symbol  on  the  left-hand  side 
of  the  first  rule  in  RG. 

EXAMPLE  7.1  Even  Length  Strings 

Let  L  -  {ive  {a,b}*:|  w>|  is  even}.  The  following  regular  expression  defines  L: 

((aa)U(ab)U(ba)U(bb))*. 

The  following  DFSM  M  accepts  L : 


a,b 
a,b 

The  following  regular  grammar  G  also  defines  L : 

S—*e 
S->a7 
S-*bT 
r—  as 
T—*bS 

In  G,  the  job  of  the  nonterminal  S  is  to  generate  an  even  length  string.  It  does 
this  either  by  generating  the  empty  string  or  by  generating  a  single  character  and 
then  creating  T.The  job  of  T  is  to  generate  an  odd  length  string.  It  does  this  by 
generating  a  single  character  and  then  creating  S.  S  generates  e.  the  shortest  pos¬ 
sible  even  length  string.  So.  if  T  can  be  shown  to  generate  all  and  only  the  odd 
length  strings,  we  can  show  that  S  generates  all  and  only  the  remaining  even 
length  strings.  T  generates  every  string  whose  length  is  one  greater  than  the 
length  of  some  string  S  generates.  So,  if  5  generates  all  and  only  the  even  length 
strings,  then  T  generates  all  and  only  the  other  odd  length  strings. 

Notice  the  clear  correspondence  between  M  and  G.  which  we  have  highlighted 
by  naming  M's  stales  S  and  T.  Even  length  strings  drive  M  to  state  S.  Even  length 
strings  are  generated  by  G  starting  with  S.  Odd  length  strings  drive  M  to  state  T. 
Odd  length  strings  are  generated  by  G  starting  with  T. 
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7.2  Regular  Grammars  and  Regular  Languages 

THEOREM  7.1  Regular  Grammars  Define  Exactly  the  Regular  Languages 

Theorem:  The  class  of  languages  that  can  be  defined  with  regular  grammars  is  exactly 
the  regular  languages. 

Proof:  We  first  show  that  any  language  that  can  be  defined  with  a  regular  grammar 
can  be  accepted  by  some  FSM  and  so  is  regular.  Then  we  must  show  that  every 
regular  language  (i.e.,  every  language  that  can  be  accepted  by  some  FSM)  can  be 
defined  with  a  regular  grammar.  Both  proofs  are  by  construction. 

Regular  grammar  —*  FSM:  The  following  algorithm  constructs  an  FSM  M 
from  a  regular  grammar  G  =  (V\  S,  R,  S  )  and  assures  that  L  ( M )  =  L(G): 

graninwriofsm(G :  regular  grammar)  = 

1.  Create  in  M  a  separate  state  for  each  nonterminal  in  V. 

2.  Make  the  state  corresponding  to  S  the  start  state. 

3.  If  there  arc  any  rules  in  R  of  the  form  X  — *  w,  for  some  treX,  then  create 
an  additional  state  labeled  #. 

4.  For  each  rule  of  the  form  X  —*  wY.  add  a  transition  from  X  to  Y  labeled  w. 

5.  For  each  rule  of  the  form  X  — *  w,  add  a  transition  from  X  to  #  labeled  w. 

6.  For  each  rule  of  the  form  X—*e.  mark  state  X  as  accepting. 

7.  Mark  state  It  as  accepting. 

8.  If  M  is  incomplete  (i.e..  there  are  some  (state,  input)  pairs  for  which  no 
transition  is  defined),  M  requires  a  dead  state.  Add  a  new  state  D.  For 
every  (</.  /)  pair  for  which  no  transition  has  already  been  defined,  create  a 
transition  from  q  to  D  labeled  /.  For  every  /  in  2,  create  a  transition  from 
D  to  D  labeled  /. 

FSM— *  Regular  grammar:  The  construction  is  effectively  the  reverse  of  the 
one  we  just  did.  We  leave  this  step  as  an  exercise. 


EXAMPLE  7.2  Strings  that  End  with  aaaa 

Let  L  =  {me  (a,  b}*:w  ends  with  the  pattern  aaaa}.  Alternatively.  L  =  (aUb)* 
aaaa. The  following  regular  grammar  defines  L: 

S  — ►  aS  f*  An  arbitrary  number  of  a's  and  b’s  can  be  generated 

S~+  bS  before  the  pattern  starts. 

S—*diB  I*  Generate  the  first  a  of  the  pattern. 
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EXAMPLE  7.2  ( Continued ) 

B  —*  aC  /*  Generate  the  second  a  of  the  pattern. 

C  — »  a£>  I*  Generate  the  third  a  of  the  pattern. 

D-*  a  /*  Generate  the  last  a  of  the  pattern  and  quit. 

Applying  grammartofsm  to  this  grammar,  we  gel.  omitting  the  dead  state: 


a,b 


Notice  that  the  machine  that  grammartofsm  builds  is  not  necessarily  deterministic. 

EXAMPLE  7.3  The  Missing  Letter  Language 

Let  1  -  {a,  b,  c}.  Let  L  be  LMisxinK  =  {u> :  there  is  a  symbol  a,  e  £  not  appear¬ 
ing  in  w},  which  we  defined  in  Example  5.12.  The  following  grammar  G  generates 

L  Missing" 

S-*e 
S-»  afi 
S-*  aC 
S  — *  b A 
bC 
5— »  cA 
S—  c  B 
A~>bA 
A  — *  c4 
A  — *  e 
fl->  afi 
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B—*e 
C-*  a  C 
C  —  bC 
C—*e 

The  job  of  S  is  to  generate  some  string  in  LMissing.  It  does  that  by  choosing  a  first 
character  of  the  string  and  then  choosing  which  other  character  will  be  missing. 
The  job  of  A  is  to  generate  all  strings  that  do  not  contain  any  a’s.  The  job  of  B  is  to 
generate  all  strings  that  do  not  contain  any  b’s.  And  the  job  of  C  is  to  generate  all 
strings  that  do  not  contain  any  c’s. 

If  we  apply  gramniartofsm  to  G,  we  get  M  = 


M  is  identical  to  the  NDFSM  we  had  previously  built  for  LMissing  except  that  it 
waits  to  guess  whether  to  go  to  A,  B  or  C  until  it  has  seen  its  first  input  character. 


Our  proof  of  the  first  half  of  Theorem  7.1  clearly  describes  the  correspondence  be¬ 
tween  the  nonterminals  in  a  regular  grammar  and  the  states  in  a  corresponding  FSM. 
This  correspondence  suggests  a  natural  way  to  think  about  the  design  of  a  regular 
grammar.The  nonterminals  in  such  a  grammar  need  to  “remember”  the  relevant  state 
of  a  left-to-right  analysis  of  a  string. 

EXAMPLE  7.4  Satisfying  Multiple  Criteria 

Let  L  =  {w  e  {a,  b}*:  w  contains  an  odd  number  of  a’s  and  w  ends  in  a}.  We  can 
write  a  regular  grammar  G  that  defines  L.  G  will  contain  four  nonterminals,  each 
with  a  unique  function  (corresponding  to  the  states  of  a  simple  FSM  that  accepts 
L).  So,  in  any  derived  string,  if  the  remaining  nonterminal  is: 

•  S,  then  the  number  of  a’s  so  far  is  even.  We  don’t  have  worry  about  whether 
the  string  ends  in  a  since,  to  derive  a  string  in  L,  it  will  be  necessary  to  gener¬ 
ate  at  least  one  more  a  anyway. 
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EXAMPLE  7.4  ( Continued ) 

•  T,  then  the  number  of  a’s  so  far  is  odd  and  the  derived  string  ends  in  a. 

•  X ,  then  the  number  of  a's  so  far  is  odd  and  the  derived  string  does  not  end  in  a. 

Since  only  T  captures  the  situation  in  which  the  number  of  a’s  so  far  is  odd  and 
the  derived  string  ends  in  a,  T  is  the  only  nonterminal  that  can  generate  e.  G 
contains  the  following  rules: 

S  -*  bS  /*  Initial  b’s  don’t  matter. 

S  -*  aT  /*  After  this,  the  number  of  a’s  is  odd  and  the  generated 
string  ends  in  a. 

T-+  e  I*  Since  the  number  of  a’s  is  odd,  and  the  string  ends  in 

a,  it's  okay  to  quit. 

T  -*  aS  /*  After  this,  the  number  of  a’s  will  be  even  again. 

T-*bX  I*  After  this,  the  number  of  a’s  is  still  odd  but  the  gener¬ 

ated  string  no  longer  ends  in  a. 

X-*aS  /*  After  this,  the  number  of  a’s  will  be  even. 

X—*bX  I*  After  this,  the  number  of  a’s  is  still  odd  and  the  gen¬ 

erated  string  still  does  not  end  in  a. 

To  see  how  this  grammar  works,  we  can  watch  it  generate  the  string  baaba: 

5  =>  b5  /*  Still  an  even  number  of  a’s. 

=*  ba T  /*  Now  an  odd  number  of  a’s  and  ends  in  a.  The  process 

could  quit  now  since  the  derived  string,  ba,  is  in  L. 

=*  baaS  I*  Back  to  having  an  even  number  of  a’s,  so  it  doesn’t 
matter  what  the  last  character  is. 

=>  baabS  /*  Still  even  a’s. 

=*  baabaT  I*  Now  an  odd  number  of  a's  and  ends  in  a.The 
process  can  quit,  by  applying  the  rule  T— ►  e. 

=>  baaba 

So  now  we  know  that  regular  grammars  define  exactly  the  regular  languages.  But 
regular  grammars  are  not  often  used  in  practice.  The  reason,  though,  is  not  that  they 
couldn't  be.  It  is  simply  that  there  is  something  better.  Given  some  regular  language 
L,  the  structure  of  a  reasonable  FSM  for  L  very  closely  mirrors  the  structure  of  a 
reasonable  regular  grammar  for  it.  And  FSMs  arc  easier  to  work  with.  In  addition, 
there  exist  regular  expressions.  In  Parts  III  and  IV.  as  we  move  outward  to  larger 
classes  of  languages,  there  will  no  longer  exist  a  technique  like  regular  expressions. 
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At  that  point,  particularly  as  we  are  considering  the  context-free  languages,  we  will 
see  that  grammars  are  a  very  important  and  useful  way  to  define  languages. 


Exercises 

1.  Show  a  regular  grammar  for  each  of  the  following  languages: 

a.  {u>e{a,  b}*:w  contains  an  even  number  of  a’s  and  an  odd  number  of  b’s}. 

b.  {we  {a,b}*:  w  does  not  end  in  aa}. 

c.  {w  e  {a,  b}*:  w  contains  the  substring  abb}. 

d.  {we  {a,  b}*:  if  w  contains  the  substring  aa  then  |w|  is  odd}. 

e.  [we  {a,b}*:  w  does  not  contain  the  substring  aabb}. 

2.  Consider  the  following  regular  grammar  G: 

S->  a T 
T-*  b  T 
T  — *  a 
T  — *  aW 
W-*e 
W—  a  T 

a.  Write  a  regular  expression  that  generates  L(G). 

b.  Use  grammartofsm  to  generate  an  FSM  M  that  accepts  L(G). 

3.  Consider  again  the  FSM  M  shown  in  Exercise  5.1.  Show  a  regular  grammar  that 
generates  L[M). 

4.  Show  by  construction  that,  for  every  FSM  M  there  exists  a  regular  grammar  G 
such  that  L  (G )  =  L(M). 

5.  Let  L  =  {ite{a,  b}*:  every  a  in  w  is  immediately  followed  by  at  least  one  b}. 

a.  Write  a  regular  expression  that  describes  L. 

b.  Write  a  regular  grammar  that  generates  L. 

c.  Construct  an  FSM  that  accepts  L. 


CHAPTER  8 


Regular  and  Nonregular  Languages 


The  language  a*b*  is  regular.  The  language  A"Bn  =  {a"b" :  n  ^  0}  is  not  regu¬ 
lar  (intuitively  because  it  is  not  possible,  given  some  finite  number  of  states,  to 
count  an  arbitrary  number  of  a*s  and  then  compare  that  count  to  the  number  of 
b's).The  language  {tee  {a.b}*  :  every  a  is  immediately  followed  by  a  b}  is  regular.The 
similar  sounding  language  { w  e  {a,  b}*  :  every  a  has  a  matching  b  somewhere  and  no  b 
matches  more  than  one  a}  is  not  regular  (again  because  it  is  now  necessary  to  count  the 
a’s  and  make  sure  that  the  number  of  b’s  is  at  least  as  great  as  ihe  number  of  as.) 

Given  a  new  language  L,  how  can  we  know  whether  or  not  it  is  regular?  In  this 
chapter,  we  present  a  collection  of  techniques  that  can  be  used  to  answer  that  question. 

8.1  How  Many  Regular  Languages  Are  There? 

First,  we  observe  that  there  are  many  more  nonrcgular  languages  than  there  are  regu¬ 
lar  ones: 

THEOREM  8.1  The  Regular  Languages  are  Countably  Infinite 

Theorem:  There  is  a  countably  infinite  number  of  regular  languages. 

Proof:  We  can  lexicographically  enumerate  all  the  syntactically  legal  DFSMs  with 
input  alphabet  2.  Every  regular  language  is  accepted  hv  at  least  one  of  them.  So 
there  cannot  be  more  regular  languages  than  there  are  DFSMs.  Thus  there  are  at 
most  a  countably  infinite  number  of  regular  languages.  There  is  not  a  one-to-one 
relationship  between  regular  languages  and  DFSMs  since  there  is  an  infinite 
number  of  machines  that  accept  any  given  language.  But  the  number  of  regular 
languages  is  infinite  because  it  includes  the  follow  ing  infinite  set  of  languages: 

{a},  {aa},  {aaa},  {aaaa},  {aaaaa},  (aaaaaa),  ... 
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But,  by  Theorem  2.3,  there  is  an  uncountably  infinite  number  of  languages  over  any 
nonempty  alphabet  So  there  are  many  more  nonregular  languages  than  there  are  reg¬ 
ular  ones. 

8.2  Showing  That  a  Language  Is  Regular 

But  many  languages  are  regular.  How  can  we  know  which  ones?  We  start  with  the  sim¬ 
plest  cases. 

THEOREM  8.2  The  Finite  Languages _ 

Theorem:  Every  finite  language  is  regular. 

Proof:  If  L  is  the  empty  set,  then  it  is  defined  by  the  regular  expression  0  and  so  is 
regular.  If  it  is  any  finite  language  composed  of  the  strings suS2,..-Sn  for  some 
positive  integer  n ,  then  it  is  defined  by  the  regular  expression: 

U  $2  U  •  *  •  Us„ 

So  it  too  is  regular. 


EXAMPLE  8.1  The  Intersection  of  Two  Infinite  Languages 

Let  L  =  L\C\  L2,  where  Lj  =  {a'V :  n  ^  0}  and  L2  =  {^a" :  n  ^  0}.  As  we 
will  soon  be  able  to  prove,  neither  L\  nor  L2  is  regular.  But  L  is.  L  =  {e}, 
which  is  finite. 


EXAMPLE  8.2  A  Finite  Language  We  May  Not  Be  Able  to  Write  Down 

Let  L  =  {w  e  {0  —  9}*  :  w  is  the  social  security  number  of  a  living  US  resident}. 
L  is  regular  because  it  is  finite.  It  doesn’t  matter  that  no  individual  or  organization 
happens,  at  any  given  instant,  to  know  what  strings  are  in  L. 


Note,  however,  that  although  the  language  in  Example  8.2  is  formally  regular,  the 
techniques  that  we  have  described  for  recognizing  regular  languages  would  not  be  very 
useful  in  building  a  program  to  check  for  a  valid  social  security  number.  Regular  ex¬ 
pressions  are  most  useful  when  the  elements  of  L  match  one  or  more  patterns.  FSMs 
are  most  useful  when  the  elements  of  L  share  some  simple  structural  properties.  Other 
techniques,  like  hash  tables,  are  better  suited  to  handling  finite  languages  whose  ele¬ 
ments  are  chosen  by  our  world,  rather  than  by  rule. 
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EXAMPLE  8.3  Santa  Clause,  God,  and  the  History  of  the  Americas 

Let: 

•  L]  =  (tee  (  0  -  9}*  :  w  is  the  social  security  number  of  the  current  US  pres¬ 
ident}. 

•  L2  =  { 1  if  Santa  Claus  exists  and  0  otherwise } . 

•  Ly  =  {1  if  God  exists  and  0  otherwise}. 

•  Z-4  =  {1  if  there  were  people  in  North  America  more  than  K).(KK)  years  ago 
and  0  otherwise}. 

•  L<5  =  {1  if  there  were  people  in  North  America  more  than  15.000  years  ago 
and  0  otherwise}. 

•  Lft  =  (we  (0  -  9K  :  w  is  the  decimal  representation,  without  leading  0’s,  of 
a  prime  Fermat  number}. 

L|  is  clearly  finite,  and  thus  regular.  There  exists  a  simple  FSM  to  accept  it.  even 
though  none  of  us  happens  to  know  what  that  FSM  is.  JL>  and  Ly  are  perhaps  a  little 
less  clear,  but  that  is  because  the  meanings  of  “Santa  Claus"  and  “God”  are  less 
clear.  Pick  a  definition  for  either  of  them. Then  something  that  satisfies  that  defini¬ 
tion  either  does  or  does  not  exist.  So  either  the  simple  FSM  that  accepts  {0}  and 
nothing  else  or  the  simple  FSM  that  accepts  {1}  and  nothing  else  accepts  L2.  And 
one  of  them  (possibly  the  same  one,  possibly  the  other  one)  accepts  Ly.  L4  is  clear. 
It  is  the  set  { 1} .  Ls  is  also  finite,  and  thus  regular.  Either  there  were  people  in  North 
America  by  15.000  years  ago  or  there  were  not.  although  the  currently  available  fos¬ 
sil  evidence  Q  is  unclear  as  to  which.  So  we  (collectively)  just  don't  know  yet  which 
machine  to  build.  is  similar,  although  this  time  what  is  lacking  is  mathematics,  as 
opposed  to  fossils.  Recall  from  Section  4.1  that  the  Fermat  numbers  are  defined  by 

Fn  =  2r  +  1.  n  >  0. 

The  first  five  elements  of  F„  are  {3, 5, 17, 257, 65,537}.  All  of  them  arc  prime.  It 
appears  likely  53  that  no  other  Fermat  numbers  are  prime.  If  that  is  true,  then  L6 
is  finite  and  thus  regular.  If  it  turns  out  that  the  set  of  Fermat  numbers  is  infinite, 
then  it  is  almost  surely  not  regular. 


Not  every  regular  language  is  computationally  tractable.  Consider  the  Tow¬ 
ers  of  Hanoi  language.  (P.  2) 


But,  of  course,  most  interesting  regular  languages  are  infinite.  So  far,  we've  devel¬ 
oped  four  techniques  for  showing  that  a  (finite  or  infinite)  language  /.  is  regular: 

•  Exhibit  a  regular  expression  for  L. 

•  Exhibit  an  FSM  for  L. 
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•  Show  that  the  number  of  equivalence  classes  of  % L  is  finite. 

•  Exhibit  a  regular  grammar  for  L. 

8.3  Some  Important  Closure  Properties  of  Regular 
Languages 

We  now  consider  one  final  technique,  which  allows  us,  when  analyzing  complex  lan¬ 
guages,  to  exploit  the  other  techniques  as  subroutines. The  regular  languages  are  closed 
under  many  common  and  useful  operations.  So,  if  we  wish  to  show  that  some  language 
L  is  regular  and  we  can  show  that  L  can  be  constructed  from  other  regular  languages 
using  those  operations,  then  L  must  also  be  regular. 

THEOREM  8.3  Closure  under  Union,  Concatenation  and  Kleene  Star 

Theorem:  The  regular  languages  are  closed  under  union,  concatenation,  and  Kleene 
star. 

I  Proof:  By  the  same  constructions  that  were  used  in  the  proof  of  Kleene’s  theorem. 

THEOREM  8.4  Closure  under  Complement,  Intersection,  Difference,  Reverse 
and  Letter  Substitution 

Theorem:  The  regular  languages  are  closed  under  complement,  intersection,  differ¬ 
ence,  reverse,  and  letter  substitution. 

Proof: 

•  The  regular  languages  aie  closed  under  complement.  If  Li  is  regular,  then 
there  exists  a  DFSM  M\  =  (K,  2. 8,s,A)  that  accepts  it.  The  DFSM 
M2  =  (K,  2,5,s,  K  -  A),  namely  Af,  with  accepting  and  nonaccepting  states 
swapped,  accepts  -,(L(Af|))  because  it  rejects  all  strings  that  Af,  accepts  and 
rejects  all  strings  that  M\  accepts. 

Given  an  arbitrary  (possibly  nondeterministic)  FSM  M,  =  (K\,  2,  Ai,s,.  4,), 
we  can  construct  a  DFSM  Af2  =  ( K2 ,  2, 52,  s2, A>)  such  that  JL(Af>)  =  ->U(Afi)). 
We  do  so  as  follows:  From  M\,  construct  an  equivalent  deterministic  FSM  M '  = 
(Km  •,  2.  •,  s\f  •,  ),  using  the  algorithm  ndfsnuodfsm ,  presented  in  the  proof 

of  Theorem  53.  (If  Af,  is  already  deterministic,  Af'  =  Af,.)  Af'  must  be  stated  com¬ 
pletely,  so  if  it  is  described  with  an  implied  dead  state,  add  the  dead  state  and  all  re¬ 
quired  transitions  to  it.  Begin  building  A/2  by  setting  it  equal  to  Af'.  Then  swap  the 
accepting  and  the  nonaccepting  states.  So  A/2  =  (KM-,  2,  S, *•,%•,  Km  -  AM). 

•  The  regular  languages  are  closed  under  intersection.  We  note  that: 

L(M\)  D  L(Af2)  =  -*nL(Af,)U-,L(Af2)). 

We  have  already  shown  that  the  regular  languages  are  closed  under  both  com¬ 
plement  and  union.  Thus  they  are  also  closed  under  intersection. 
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It  is  also  possible  to  prove  this  claim  by  construction  of  an  FSM  that  accepts 
L(Mj)  fl  L{M2).  We  leave  that  proof  as  an  exercise. 

•  The  regular  languages  are  closed  under  set  difference  (subtraction).  We  note 
that: 

L(Mi)  -  L(M2)  =  L(/V/t)  n  -,L(M2). 

We  have  already  shown  that  the  regular  languages  are  closed  under  both 
complement  and  intersection.  Thus  they  are  also  closed  under  set  difference. 

This  claim  too  can  also  be  proved  by  construction,  which  we  leave  as  an 
exercise. 

•  The  regular  languages  are  closed  under  reverse.  Recall  that  LR  =  {u>e  2*  ' 
w  =  x  K  for  some  xeLJ.We  leave  the  proof  of  this  as  an  exercise. 

•  The  regular  languages  are  closed  under  letter  substitution,  defined  as  follows: 
Consider  any  two  alphabets.  2)  and  22.  Let  sub  be  any  function  from  2)  to 
22*.Then  letsub  is  a  letter  substitution  function  from  L\  to  L2  iff  lelsub{L\)  = 
{we  22*  i  3y  e  L\  (w  =  y  except  that  every  character  c of  y  has  been  replaced 
by  sub(c))}.  For  example, suppose  that  2|  =  {a.b}.  22  =  {0,l},.w/b(a)  =  0, 
and  sub{  b)  =  11.  Then  letsuh{  { a''b" :  n  s  ()})  =  {O"!2" :«  all).  We  leave 
the  proof  that  the  regular  languages  are  closed  under  letter  substitution  as  an 
exercise. 


EXAMPLE  8.4  Closure  Under  Complement 
Consider  the  following  NDFSM  M  = 


If  we  use  the  algorithm  that  we  just  described  to  convert  M  to  a  new  machine 
M'  that  accepts  L{M ),  the  last  step  is  to  swap  the  accepting  and  the  nonaccept¬ 
ing  states.  A  quick  look  at  M  makes  it  clear  why  it  is  necessary  first  to  make  M  de¬ 
terministic  and  then  to  complete  it  by  adding  the  dead  state.  M  accepts  the  input 
a  in  state  4.  If  we  simply  swapped  accepting  and  nonaccepting  states,  without 


8.3  Some  Important  Closure  Properties  of  Regular  Languages  167 


making  the  other  changes,  M'  would  also  accept  a.  It  would  do  so  in  state  2.  The 
problem  is  that  M  is  nondeterministic,  and  has  one  path  along  which  a  is  accepted 
and  one  along  which  it  is  rejected. 

To  see  why  it  is  necessary  to  add  the  dead  state,  consider  the  input  string 
aba.  M  rejects  it  since  the  path  from  state  3  dies  when  M  attempts  to  read  the 
final  a  and  the  path  from  state  4  dies  when  it  attempts  to  read  the  b.  But,  if  we 
don't  add  the  dead  state,  M ’  will  also  reject  it  since,  in  it  too,  both  paths  will  die. 


The  closure  theorems  that  we  have  now  proved  make  it  easy  to  take  a  divide-and- 
conquer  approach  to  showing  that  a  language  is  regular.  They  also  let  us  reuse  proofs 
and  constructions  that  we’ve  already  done. 


EXAMPLE  8.5  The  Divide-and-Conquer  Approach 

Let  L  =  {to  e  {a,  b}*  :  to  contains  an  even  number  of  a’s  and  an  odd  number  of 
b’s  and  all  a’s  come  in  runs  of  three}.  L  is  regular  because  it  is  the  intersection  of 
two  regular  languages.  L  =  L\  D  L2,  where: 

•  L\  =  {w  e  {a,  b}* :  w  contains  an  even  number  of  a’s  and  an  odd  number  of 
b's  },  and 

•  Li~  {toe  {a,  b}*  :  all  a’s  come  in  runs  of  three}. 

We  already  know  that  Lt  is  regular,  since  we  showed  an  FSM  that  accepts  it  in 
Example  5.9: 


Of  course,  we  could  start  with  this  machine  and  modify  it  so  that  it  accepts  L. 
But  an  easier  way  is  exploit  a  divide-and-conquer  approach.  We’ll  just  use  the 
machine  we  have  and  then  build  a  second  simple  machine,  this  one  to  accept  L^. 
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Then  we  can  prove  that  L  is  regular  by  exploiting  the  fact  that  the  regular  languages 
are  closed  under  intersection.  The  following  machine  accepts  L2: 


The  closure  theorems  are  powerful,  but  they  say  only  what  they  say.  We  have  stated 
each  of  the  closure  theorems  in  as  strong  a  form  as  possible.  Any  similar  claims  that  are 
not  implied  by  the  theorems  as  we  have  stated  them  are  almost  certainly  false,  which 
can  usually  be  shown  easily  by  finding  a  simple  counterexample. 

EXAMPLE  8.6  What  the  Closure  Theorem  for  Union  Does  Not  Say 

The  closure  theorem  for  union  says  that: 

if  L\  and  L 2  are  regular  then  L-  U  L2  is  regular. 

The  theorem  says  nothing,  for  example,  about  what  happens  if  L  is  regular.  Does 
that  mean  that  L\  and  L2  are  also?  The  answer  is  maybe.  We  know  that  a*  is  reg¬ 
ular.  We  will  consider  two  cases  for  L\  and  L2.  First,  let  them  be: 

a*  =  {ap :  p  >  0  and p  is  prime  }  U  {a7* :  p  >  ()  and  p  is  not  prime}. 

a+  =  L}  U  L2. 

As  we  will  see  in  the  next  section,  neither  L  \  nor  L2  is  regular.  But  now  consider: 

a+  =  {a f:p  >  0  and  p  is  even }  U  {a P\p  >  0  and/)  is  odd}. 

a+  =  L}  U  L2. 

In  this  case,  both  L\  and  L2  arc  regular. 


EXAMPLE  8.7  What  the  Closure  Theorem  for  Concatenation  Does  Not  Say 

The  closure  theorem  for  concatenation  says  that: 

if  L,  and  L2  are  regular  then  L  =  L,L2  is  regular. 

But  the  theorem  says  nothing,  for  example,  about  what  happens  if  L->  is  not  regu¬ 
lar.  Does  that  mean  that  L  isn't  regular  either?  Again,  the  answer  is  maybe.  We 
first  consider  the  following  example: 

{aba"b":n  &  0}  =  {ab}{a"b":«  >  ()}. 

L  =  L\  L2. 
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As  we’ll  see  in  the  next  section,  Li  is  not  regular.  And.  in  this  case,  neither  is  L. 
But  now  consider: 

{aaa*}  =  {a’>}{a'’  :p  is  prime}. 

Z-  L\  Zj. 

While  again  is  not  regular,  now  L  is. 


8.4  Showing  That  a  Language  is  Not  Regular 

We  can  show  that  a  language  is  regular  by  exhibiting  a  regular  expression  or  an  FSM  or 
a  finite  list  of  the  equivalence  classes  of  or  a  regular  grammar,  or  by  using  the  clo¬ 
sure  properties  that  we  have  proved  hold  for  the  regular  languages.  But  how  shall  we  • 
show  that  a  language  is  not  regular?  In  other  words,  how  can  we  show  that  none  of 
those  descriptions  exists  for  it?  It  is  not  sufficient  to  argue  that  we  tried  to  find  one  of 
them  and  failed.  Perhaps  we  didn’t  look  in  the  right  place.  We  need  a  technique  that 
does  not  rely  on  our  cleverness  (or  lack  of  it). 

What  we  can  do  is  to  make  use  of  the  following  observation  about  the  regular  languages: 
Every  regular  language  L  can  be  accepted  by  an  FSM  M  with  a  finite  number  of  states.  If  L 
is  infinite,  then  there  must  be  at  least  one  loop  in  M.  All  sufficiently  long  strings  in  L  must 
be  characterized  by  one  or  more  repeating  patterns,  corresponding  to  the  substrings  that 
drive  M  through  its  loops.  It  is  also  true  that,  if  L  is  infinite,  then  any  regular  expression  that 
describes  L  must  contain  at  least  one  Kleene  star,  but  we  will  focus  here  on  FSMs. 

To  help  us  visualize  the  rest  of  this  discussion,  consider  the  FSM  MLOOp ,  shown  in 
Figure  8.1  (a).  has  5  slates.  It  can  accept  an  infinite  number  of  strings.  But  the 

longest  one  that  it  can  accept  without  going  through  any  loops  has  length  4.  Now  consider 
the  slightly  different  FSM  AZt„  shown  in  Figure  8.1  (b).  Mr  also  has  5  states  and  one  loop. 
But  it  accepts  only  one  string.  aab.The  only  string  that  can  drive  Mr  through  its  loop  is  e. 
No  matter  how  many  times  Mr  goes  through  the  loop,  it  cannot  accept  any  longer  strings. 

To  simplify  the  following  discussion,  we  will  consider  only  DFSMs,  which  have  no 
e-transitions.  Each  transition  step  that  a  DFSM  takes  corresponds  to  exactly  one  char¬ 
acter  in  its  input.  Since  any  language  that  can  be  accepted  by  an  NDFSM  can  also  be 
accepted  by  a  DFSM,  this  restriction  will  not  affect  our  conclusions. 


(a) 

<b) 


FIGURE  8.1  What  is  the 
longest  string  that  a  5-slate 
FSM  can  accept? 
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THEOREM  8.5  Long  Strings  Force  Repeated  States 

Theorem:  Let  M  *  (K.  E.fi.s.A  )  be  any  DFSM.  If  M  accepts  any  string  of 
length  |  A.'|  or  greater,  then  that  siring  will  force  M  to  visit  some  state  more  than 
once  (thus  traversing  at  least  one  loop). 

Proof:  M  must  start  in  one  of  its  states.  Each  time  it  reads  an  input  character.it  vis¬ 
its  some  state.  So,  in  processing  a  string  of  length  n,  M  creates  a  total  of  n  +  1 
state  visits  (the  initial  one  plus  one  for  each  character  it  reads).  If  n  +  1  >  \K\. 
then,  by  the  pigeonhole  principle,  some  state  must  get  more  than  one  visit.  So,  if 
n  2:  1/C  I,  then  M  must  visit  at  least  one  state  more  than  once. 


Let  M  =  {K,  2, 5,  s.  A )  be  any  DFSM.  Suppose  that  there  exists  some  “long"  string 
w  (i.e.,  M  ^  |K|)  such  that  we  L(M).  Then  M  must  go  through  at  least  one  loop 
when  it  reads  w.  So  there  is  some  substring  y  of  ir  that  drove  M  through  at  least  one 
loop.  Suppose  we  excise  y  from  u'.Thc  resulting  string  must  also  be  in  t.(M)  since  M 
can  accept  it  just  as  it  accepts  w  but  skipping  one  pass  through  one  loop.  Further,  sup¬ 
pose  that  we  splice  in  one  or  more  extra  copies  of  y.  immediately  adjacent  to  the  origi¬ 
nal  one.  All  the  resulting  strings  must  also  be  in  L(M )  since  M  cun  accept  them  by 
going  through  its  loop  one  or  more  additional  times.  Using  an  analogy  with  a  pump, 
we'll  say  that  we  can  pump  y  out  once  or  in  an  arbitrary  number  of  times  and  the  re¬ 
sulting  siring  must  still  be  in  L. 

To  make  this  concrete,  let’s  look  again  at  which  accepts,  for  example,  the 

string  babbab.  babbab  is  “long”  since  its  length  is  6  and  I  A.*  1  =  5.  The  second  b  drove 
Mloop  through  its  loop.  Call  the  string  (in  this  case  b)  that  drove  M/oop  through  its 
loop  y.  We  can  pump  it  out.  producing  babab.  which  is  also  accepted  by  M/.oor-  Or  we 
can  pump  in  as  many  copies  of  b  as  we  like,  generating  such  strings  as  babbbab. 
babbbbbab.  and  so  forth.  Mloop  also  accepts  all  of  them.  Returning  to  the  original 
string  babbab.  the  third  b  also  drove  Mi.onp  through  its  loop.  We  could  also  pump  it  (in 
or  out)  and  get  a  similar  result. 

This  property  of  FSMs.  and  the  languages  that  they  can  accept,  is  the  basis  for  a 
powerful  tool  for  showing  that  a  language  is  not  regular.  If  a  language  contains  even 
one  long  (to  be  defined  precisely  below)  string  that  cannot  be  pumped  in  the  fashion 
that  we  have  just  described,  then  it  is  not  accepted  by  any  FSM  and  so  is  not  regular. 
We  formalize  this  idea,  as  the  Pumping  Theorem,  in  the  next  section. 

8.4.1  The  Pumping  Theorem  for  Regular  Languages 

THEOREM  8.6  The  Pumping  Theorem  for  Regular  Languages  _ 

Theorem:  If  L  is  a  regular  language,  then: 

3 k  ^  l  (V  strings  w  e  L,  where  |ir|  ^  k  ( It.  y.  z  (  «’  =  xyz. 

Uy|  ^  k. 

y  *  k.  and 

V()  s  0  (.vv  Hz  6  /.)))). 
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Proof;  The  proof  is  the  argument  that  we  gave  above:  If  L  is  regular  then  it  is  accepted 
by  some  DFSM  M  =  (K.  2, 6,  s,  A).  Let  k  be  \K\.  Let  w  be  any  string  in  L  of  length 
k  or  greater.  By  Theorem  8.5,  to  accept  w,  M  must  traverse  some  loop  at  least  once. 
We  can  carve  w  up  and  assign  the  name  y  to  the  first  substring  to  drive  M  through  a 
loop.  Then  x  is  the  part  of  w  that  precedes  y  and  z  is  the  part  of  w  that  follows  y.  We 
show  that  each  of  the  last  three  conditions  must  then  hold: 

•  |.ry|  <  k :  M  must  not  only  traverse  a  loop  eventually  when  reading  u>,  it  must 
do  so  for  the  first  time  by  at  least  the  time  it  has  read  k  characters.  It  can  read 
k  —  1  characters  without  revisiting  any  states.  But  the  k ^  character  must,  if 
no  earlier  character  already  has,  take  M  to  a  state  it  has  visited  before.  What¬ 
ever  character  does  that  is  the  last  in  one  pass  through  some  loop. 

•  y  #  e:  Since  M  is  deterministic,  there  are  no  loops  that  can  be  traversed  by  e. 

•  V<7  s  0  (xyqz  e  L ):  y  can  be  pumped  out  once  (which  is  what  happens  if  q  =  0) 
or  in  any  number  of  times  (which  happens  if  q  is  greater  than  l )  and  the  result¬ 
ing  string  must  be  in  L  since  it  will  be  accepted  by  M.  It  is  possible  that  we  could 
chop  y  out  more  than  once  and  still  generate  a  string  in  L.  but  without  knowing 
how  much  longer  w  is  than  k,  we  don’t  know  any  more  than  that  it  can  be 
pumped  out  once. 


The  Pumping  Theorem  tells  us  something  that  is  true  of  every  regular  language.  Gen¬ 
erally,  if  we  already  know  that  a  language  is  regular,  we  won’t  particularly  care  about 
what  the  Pumping  Theorem  tells  us  about  it.  But  suppose  that  we  are  interested  in 
some  language  L  and  we  want  to  know  whether  or  not  it  is  regular.  If  we  could  show 
that  the  claims  made  in  the  Pumping  Theorem  are  not  true  of  L,  then  we  would  know 
that  L  is  not  regular.  It  is  in  arguments  such  as  this  that  we  will  find  the  Pumping  The¬ 
orem  very  useful.  In  particular,  we  will  use  it  to  construct  proofs  by  contradiction.  We 
will  say.  “If  L  were  regular,  then  it  would  possess  certain  properties.  But  it  does  not 
possess  those  properties.  Therefore,  it  is  not  regular.” 


EXAMPLE  8.8  AnBn  is  not  Regular 

Let  L  be  AnBn  =  {a^b* :  n  2:  0}.  We  can  use  the  Pumping  Theorem  to  show  that 
L  is  not  regular.  If  it  were,  then  there  would  exist  some  k  such  that  any  string  w, 
where  |u?|  s  k ,  must  satisfy  the  conditions  of  the  theorem.  We  show  one  string  to 
that  does  not.  Let  w  =  a*b*.  Since  |ip|  =  2k,  w  is  long  enough  and  it  is  in  L,  so  it 
must  satisfy  the  conditions  of  the  Pumping  Theorem.  So  there  must  exist  x,y,  and 
z,  such  that  w  =  xyz,  \xy\  <  k,  y  *  e,  and  Vq  ^  0  (xy  e  L).  But  we  show  that 
no  such  x,  y ,  and  z  exist.  Since  we  must  guarantee  that  |xy|  s  k,y  must  occur 
within  the  first  k  characters  and  so  y  =  a7*  for  some  p.  Since  we  must  guarantee 
that  y  e,  p  must  be  greater  than  0.  Let  q  =  2.  (In  other  words,  we  pump  in  one 
extra  copy  of  y.)  The  resulting  string  is  ak*pbk.  The  last  condition  of  the  Pumping 
Theorem  states  that  this  string  must  be  in  L.  but  it  is  not  since  it  has  more  a’s  than 
b’s.Thus  there  exists  at  least  one  long  string  in  L  that  fails  to  satisfy  the  conditions 
of  the  Pumping  Theorem.  So  L  =  AnBn  is  not  regular. 
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The  Pumping  Theorem  is  a  powerful  tool  for  showing  that  a  language  is  not  regular. 
But.  as  with  any  tool,  using  it  effectively  requires  some  skill.  To  see  how  the  theorem 
can  be  used,  let's  state  it  again  in  its  most  general  terms: 

For  any  language  L,  if  L  is  regular,  then  every  “long*'  string  in  L  is  pumpable. 

So.  to  show  that  L  is  not  regular,  it  suffices  to  find  a  single  long  string  w  that  is  in 
L  but  is  not  pumpable.  To  show  that  a  string  is  not  pumpable,  we  must  show  that 
there  is  no  way  to  carve  it  up  into  x,y,  and  c  in  such  a  way  that  all  three  of  the  condi¬ 
tions  of  the  theorem  are  met.  It  is  not  sufficient  to  pick  a  particular  y  and  show  that  it 
doesn't  work.  (We  focus  on  y  since,  once  it  has  been  chosen,  everything  to  the  left  of 
it  is  x  and  everything  to  the  right  of  it  is  z).  We  must  show  that  there  is  no  value  fory 
that  works.  To  do  that,  we  consider  all  the  logically  possible  classes  of  values  for  y 
(sometimes  there  is  only  one  such  class,  bui  sometimes  several  must  be  considered). 
Then  we  show  that  each  of  them  fails  to  satisfy  at  least  one  of  the  three  conditions  of 
the  theorem.  Generally  we  do  that  by  assuming  that  y  does  satisfy  the  first  two  con¬ 
ditions.  namely  that  it  occurs  within  the  first  k  characters  and  is  not  b.  Then  we  con¬ 
sider  the  third  requirement,  namely  that,  for  all  values  of  q.xy,fz  is  in  L.To  show  that 
it  is  not  possible  to  satisfy  that  requirement,  it  is  sufficient  to  find  a  single  value  of  q 
such  that  the  resulting  string  is  not  in  /..Typically,  this  can  be  done  by  setting  q  to  0 
(thus  pumping  out  once)  or  to  2  (pumping  in  once),  although  sometimes  some  other 
value  of  q  must  be  considered. 

In  a  nutshell  then,  to  use  the  PumpingTheorem  to  show  that  a  language  L  is  not  reg¬ 
ular,  we  must: 

1.  Choose  a  string  w,  where  u>e  L  and  |tr|  2  k.  Note  that  we  do  not  know  what  k 
is;  we  know  only  that  it  exists.  So  we  must  state  n*  in  terms  of  k. 

2.  Divide  the  possibilities  for  y  into  a  set  of  equivalence  classes  so  that  all  strings  in 
a  class  can  be  considered  together. 

3.  For  each  such  class  of  possible  y  values,  where  |.vy|  £  k  and  y  #  e: 

Choose  a  value  for  q  such  that  xyqz  is  not  in  L. 

In  Example  8.8.y  had  to  fall  in  the  initial  a  region  of  u>,  so  that  was  the  only  case  that 
needed  to  be  considered.  But,  had  we  made  a  less  judicious  choice  for  w,  our  proof 
would  not  have  been  so  simple.  Let’s  look  at  another  proof,  with  a  different  w: 


EXAMPLE  8.9  A  Less  Judicious  Choice  for  w 

Again  let  L  be  AnBn  =  {a^b" :  n  2  0}.  If  AnBn  were  regular,  then  there  would 
exist  some  k  such  that  any  string  w,  where  |u>|  2  k,  must  satisfy  the  conditions  of 
the  theorem.  Let  w  =  alfc/2  b  i/2l  (We  must  use  f  k/ 2l,  i.e.,  the  smallest  integer 
greater  than  kll,  rather  than  truncating  the  division,  since  k  might  be  odd.)  Since 
|u>l  2  Jk  and  w  is  in  L,w  must  satisfy  the  conditions  of  the  PumpingTheorem.  So^ 
there  must  exist  x,  y,  and  z,  such  that  w  =  xyz.  Uyl  8,  and 

V<7  2  0  ( xy  qz  e  L).  We  show  that  no  such  jr,y,  and  z  exist.  This  time,  if  they  did,y 
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could  be  almost  anywhere  in  w  (since  all  the  Pumping  Theorem  requires  is  that  it 
occur  in  the  first  k  characters  and  there  are  only  at  most  k  +  1  characters).  So  we 
must  consider  three  cases  and  show  that,  in  all  three,  there  is  no  y  that  satisfies  all 
conditions  of  the  Pumping  Theorem.  A  useful  way  to  describe  the  cases  is  to  imag¬ 
ine  w  divided  into  two  regions: 

aaaaa . aaaaaa  |  bbbbb . bbbbbb 

1  I  2 

Now  we  see  that  y  can  fall: 

•  Exclusively  in  region  1:  In  this  case,  the  proof  is  identical  to  the  proof  we  did 
for  Example  8.8. 

•  Exclusively  in  region  2:  then  y  —  b  p  for  some  p.  Since  y  *  e,p  must  be  greater 
than  0.  Let  q  =  2.  The  resulting  string  is  a*b*'p.  But  this  string  is  not  in  L,  since 
it  has  more  b’s  than  a’s. 

•  Straddling  the  boundary  between  regions  1  and  2:  Then  y  =  apbr  for  some 
non-zero  p  and  r.  Let  q  =  2.  The  resulting  string  will  have  interleaved  a’s  and 
b’s,  and  so  is  not  in  L. 

There  exists  at  least  one  long  string  in  L  that  fails  to  satisfy  the  conditions  of 
the  Pumping  Theorem.  So  L  =  AnBn  is  not  regular. 


To  make  maximum  use  of  the  Pumping  Theorem’s  requirement  that  y  fall  in  the  first 
k  characters,  it  is  often  a  good  idea  to  choose  a  string  w  that  is  substantially  longer  than 
the  k  characters  required  by  the  theorem.  In  particular,  if  w  can  be  chosen  so  that  there 
is  a  uniform  first  region  of  length  at  least  k,  it  may  be  possible  to  consider  just  a  single 
case  for  where  y  can  fall. 


The  Pumping  Theorem  inspires  poets  a,  as  we’ll  see  in  Chapter  10. 


AnBn  is  a  simple  language  that  illustrates  the  kind  of  property  that  characterizes  lan¬ 
guages  that  aren  t  regular.  It  isn’t  of  much  practical  importance,  but  it  is  typical  of  a  fam¬ 
ily  of  languages,  many  of  which  are  of  more  practical  significance.  In  the  next  example, 
we  consider  Bal,  the  language  of  balanced  parentheses. The  structure  of  Bal  is  very  sim¬ 
ilar  to  that  of  AnBn.  Bal  is  important  because  most  languages  for  describing  arithmetic 
expressions.  Boolean  queries,  and  markup  systems  require  balanced  delimiters. 


EXAMPLE  8.10  The  Balanced  Parenthesis  Language  is  Not  Regular 

Let  L  be  Bal  =  {we  {),(}*:  the  parentheses  are  balanced}.  If  L  were  regular, 
then  there  would  exist  some  k  such  that  any  string  w,  where  |u|  2:  k ,  must  satisfy 
the  conditions  of  the  theorem.  Bal  contains  complex  strings  like  (())(()())•  But  it  is 
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EXAMPLE  8.10  (Continued) 

almost  always  easier  to  use  the  Pumping  Theorem  if  we  pick  as  simple  a  string  as 
possible.  So.  let  w  =  (*)*.  Since  |m|  =  2k  and  w  is  in  L.  w  must  satisfy  the  condi¬ 
tions  of  the  Pumping  Theorem.  So  there  must  exist  .t.  y,  and  z.  such  that 
tv  =  xvz.  |.ry|  s  k,y  *  e.  and  Vq  ^  t)  (.r yvz  e  L).  But  we  show  that  no  and 
z  exist.  Since  |xv|  ^  k,y  must  occur  within  the  first  k  characters  and  so  y  -  (p  for 
some  p).  Since  y  e,p  must  be  greater  than  0.  Let  q  -  2.  (In  other  words,  we 
pump  in  one  extra  copy  of  y.)  The  resulting  string  is  (k  ‘p)k.  The  last  condition  of 
the  Pumping  Theorem  states  that  this  string  must  be  in  L,  but  it  is  not  since  it  has 
more  (*s  than  )'s.  There  exists  at  least  one  long  string  in  L  that  fails  to  satisfy  the 
conditions  of  the  Pumping  Theorem.  So  L  =  Bal  is  not  regular. 


EXAMPLE  8.11  The  Even  Palindrome  Language  is  Not  Regular 

Let  L  be  PalEven  =  { wwH :  w e  { a.  b}*}.  PalEven  is  the  language  of  even- 
length  palindromes  of  a's  and  b's.  We  can  use  the  Pumping  Theorem  to  show  that 
PalEven  is  not  regular.  If  it  were,  then  there  would  exist  some  k  such  that  any 
string  te,  where  \w\  s  k .  must  satisfy  the  conditions  of  the  theorem.  We  show  one 
string  u>  that  does  not.  (Note  here  that  the  variable  w  used  in  the  definition  of  L 
is  different  from  the  variable  tv  mentioned  in  the  Pumping  Theorem.)  We  will 
choose  w  so  that  we  only  have  to  consider  one  case  for  where  y  could  fall.  Let 
w  =  a*b*bAa*.  Since  M  =  4A:  and  w  is  in  L,  to  must  satisfy  the  conditions  of  the 
Pumping  Theorem.  So  there  must  exist  .r.y.  and  z.  such  that  tr  =  xyz.  Uyl  ^ 
y  ^  e.  and  Vr/  s  OCty^eL).  Since  kyl  £  k.y  must  occur  within  the  first  k 
characters  and  so  y  =  a'’  for  some  p.  Since  y  *  e,  p  must  be  greater  than  0.  Let 
<7  =  2.  The  resulting  siring  is  at+,'blbla‘.  If  p  is  odd.  then  this  string  is  not  in 
PalEven  because  all  strings  in  PalEven  have  even  length.  If  p  is  even  then  it  is  at 
least  2.  so  the  first  half  of  the  string  has  more  a's  than  the  second  half  does,  so  it  is 
not  in  PalEven.  So  L  =  PalEven  is  not  regular. 


The  Pumping  Theorem  says  that,  for  any  language  /.,  if  /.  is  regular,  then  all  long 
strings  in  L  must  be  pumpable.  Our  strategy  in  using  it  to  show  that  a  language  L  is  not 
regular  is  to  find  one  string  that  fails  to  meet  that  requirement.  Often,  there  are  many 
long  strings  that  are  pumpable.  If  we  try  to  work  with  them,  we  will  fail  to  derive  the 
contradiction  that  we  seek.  In  that  case,  we  will  know  nothing  about  whether  or  not  L 
is  regular.  To  find  a  w  that  is  not  pumpable.  think  about  what  property  of  L  is  not 
checkable  by  an  FSM  and  choose  a  w  that  exhibits  that  property.  Consider  again  our 
last  example. The  thing  that  an  FSM  cannot  do  is  to  remember  an  arbitrarily  long  first 
half  and  check  it  against  the  second  half.  So  we  chose  a  »v  that  would  have  forced  it  to 
do  that.  Suppose  instead  that  we  had  let  tv  =  a* a*.  It  is  in  l.  and  long  enough.  But  y 
could  be  aa  and  we  could  pump  it  out  or  in  and  all  the  resulting  strings  would  be  in  L. 
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So  far,  all  of  our  Pumping  Theorem  proofs  have  set  q  to  2.  But  that  is  not  always 
the  thing  to  do.  Sometimes  it  will  be  necessary  to  set  it  to  0.  (In  other  words,  we  will 
pump  y  out). 


EXAMPLE  8.12  The  Language  with  More  a's  Than  b's  is  Not  Regular 

Let  L  =  {a"b”' :  n  >  m}.  We  can  use  the  Pumping  Theorem  to  show  that  L  is 
not  regular.  If  it  were,  then  there  would  exist  some  k  such  that  any  string  w , 
where  M  &  k,  must  satisfy  the  conditions  of  the  theorem.  We  show  one  string 
w  that  does  not.  Let  w  =  a*+1b*.  Since  \w\  =  2k  +  l  and  w  is  in  L,  w  must  sat¬ 
isfy  the  conditions  of  the  Pumping  Theorem.  So  there  must  exist  x,  y ,  and  z, 
such  that  w  =  xyz,  Uyl  s  k,y  *  e.  and  Vg  s  0  (xy  qz  e  L).  Since  |xy|  sLy 
must  occur  within  the  first  k  characters  and  so  y  =  a**  for  some  p.  Since  y  e,p 
must  be  greater  than  0.  There  are  already  more  a’s  than  b’s,  as  required  by  the 
definition  of  L.  If  we  pump  in,  there  will  be  even  more  a’s  and  the  resulting 
string  will  still  be  in  L.  But  we  can  set  q  to  0  (and  so  pump  out).  The  resulting 
string  is  then  a*+1_,,b*.  Since  p  >  0,  k  +  1  —  p  ^  k,  so  the  resulting  string  no 
longer  has  more  a’s  than  b’s  and  so  is  not  in  L.  There  exists  at  least  one  long 
string  in  L  that  fails  to  satisfy  the  conditions  of  the  Pumping  Theorem.  So  L  is 
not  regular. 


Notice  that  the  proof  that  we  just  did  depended  on  our  having  chosen  a  w  that  is  just 
barely  in  L.  It  had  exactly  one  more  a  than  b.  So  y  could  be  any  string  of  up  to  k  a’s.  If 
wc  pumped  in  extra  copies  of  y.  we  would  have  gotten  strings  that  were  still  in  L.  But  if 
we  pumped  out  even  a  single  a,  we  got  a  string  that  was  not  in  L,  and  so  we  were  able 
to  complete  the  proof..  Suppose,  though,  that  we  had  chosen  to  =  a^b*.  Again,  pump¬ 
ing  in  results  in  strings  in  L.  And  now,  if  y  were  simply  a,  we  could  pump  out  and  get  a 
string  that  was  still  in  L.  So  that  proof  attempt  fails.  In  general,  it  is  a  good  idea  to 
choose  a  w  that  barely  meets  the  requirements  for  L.That  makes  it  more  likely  that 
pumping  will  create  a  string  that  is  not  in  L. 

Sometimes  values  of  q  other  than  0  or  2  may  also  be  required. 

EXAMPLE  8.13  The  Prime  Number  of  a's  Language  is  Not  Regular 

Let  L  be  Primea  =  (a" :  n  is  prime}.  We  can  use  the  Pumping  Theorem  to  show 
that  L  is  not  regular.  If  it  were,  then  there  would  exist  some  k  such  that  any  string 
w,  where  |u>|  s  k,  must  satisfy  the  conditions  of  the  theorem.  We  show  one  string 
w  that  does  not.  Let  w  =  a/,  where  j  is  the  smallest  prime  number  greater  than 
k  +  [.  Since  |to|  >  k,w  must  satisfy  the  conditions  of  the  Pumping  Theorem.  So 
there  must  exist  x,y,  and  z,such  that  w  =  xyz ,  |xy|  s  /fcandy  *  e.y  —  a^forsome 
p.  The  Pumping  Theorem  further  requires  that  V<?  2:  0  (xy  qz  e  L).  So,  2  0 
^aU  hkl+i/M  must  ^  ^  .That  means  that  |x|  +  |*|  +  q  ♦  [y|  must  be  prime. 
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EXAMPLE  8.13  (Continued) 

But  suppose  that  q  =  U|  +  |z|.Then: 

Ul  +  Ul  +  q*\y\  -  I -V |  +  Ul  +  (Ul  +  Ul)*.)' 

=  (Ul  +  kl)*(l  +  I  v  I ), 

which  is  composite  (non-prime)  if  both  factors  are  greater  than  l.  (|.v|  +  Ul)  >1 
because  |w|  >  k  + 1,  and  |v|  ^  k.  ( 1  +|y|)  >  1  bccauselvl  >  0 .  So,  for  at  least 
that  one  value  of  q ,  the  resulting  string  is  not  in  L.  So  L  is  not  regular. 


When  we  do  a  Pumping  Theorem  proof  that  a  language  L  is  not  regular,  we  have 
two  choices  to  make:  a  value  for  w  and  a  value  for  </.  As  we  have  just  seen,  there  are 
some  useful  heuristics  that  can  guide  our  choices: 

•  To  choose  w: 

•  Choose  a  w  that  is  in  the  part  of  L  that  makes  it  not  regular. 

•  Choose  a  w  that  is  only  barely  in  L. 

•  Choose  a  w  with  as  homogeneous  as  possible  an  initial  region  of  length  at  least  k. 

•  To  choose  q: 

•  Try  letting  q  be  either  0  or  2. 

•  If  that  doesn't  work,  analyze  L  to  see  if  there  is  some  other  specific  value  that 
will  work. 

8.4.2  Using  Closure  Properties 

Sometimes  the  easiest  way  to  prove  that  a  language  L  is  not  regular  is  to  use  the  clo¬ 
sure  theorems  for  regular  languages,  either  alone  or  in  conjunction  with  the  Pumping 
Theorem.  The  fact  that  the  regular  languages  are  closed  under  intersection  is  particu¬ 
larly  useful. 


EXAMPLE  8.14  Using  Intersection  to  Force  Order  Constraints 

Let  L  =  {we  | a,  b}*  :  #a(w)  =  #b (■«?)}.  If  L  were  regular,  then  L'  =  LC\  a*b* 
would  also  be  regular.  But  L'  =  { a"b"  :  n  >  0} .  which  we  have  already  shown  is  not 
regular.  So  L  isn't  either. 


EXAMPLE  8.15  Using  Closure  Under  Complement 

Let  L  =  {a'  b ; :  /./  ^  0  and  /  #  /  }.  It  seems  unlikely  that  L  is  regular  since  any 
machine  to  accept  it  would  have  to  count  the  a's.  It  is  possible  to  use  the  Pumping 
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Theorem  to  prove  that  L  is  not  Tegular  but  it  is  nut  easy  to  see  how.  Suppose,  for 
example,  that  we  let  w  =  a*+1b*.  But  then  y  could  be  aa  and  it  would  pump  since 
a**'b*  is  in  L,and  so  is  a/c+l+2l<'-,)b\  for  all  nonnegative  values  of  q. 

Instead,  let  w  =  afcb*+*!.Then  y  =  ap  for  some  nonzero  p.  Let  q  =  (k\/p)  +  1 
(in  other  words,  pump  in  (k\/p)  times).  Note  that  (k\/p)  must  be  an  integer  because 
p  <  k.  The  number  of  a’s  in  the  resulting  string  is  k  +  ( k\/p)p  =  k  +  k\.  So  the 
resulting  string  is  aA +t!b*+fc!,  which  has  equal  numbers  of  a's  and  b's  and  so  is  not  in  L. 

The  closure  theorems  provide  an  easier  way.  We  observe  that  if  L  were  regu¬ 
lar,  then  ~,L  would  also  be  regular,  since  the  regular  languages  are  closed  under 
complement.  ->L  =  { a"  b"  :  n  2=  0}  U  {  strings  of  a’s  and  b’s  that  do  not  have  all 
a's  in  front  of  all  b’s}.  If  i L  is  regular,  then  ->L  O  a*b*  must  also  be  regular.  But 
-iLfl  a*b*  =  {a"b"  \n  s  0},  which  we  have  already  shown  is  not  regular.  So 
neither  is  ->L  or  L  . 


Sometimes,  using  the  closure  theorems  is  more  than  a  convenience.  There  are  lan¬ 
guages  that  are  not  regular  but  that  do  meet  all  the  conditions  of  the  Pumping  Theo¬ 
rem.  The  Pumping  Theorem  alone  is  insufficient  to  prove  that  those  languages  are  not 
regular,  but  it  may  be  possible  to  complete  a  proof  by  exploiting  the  closure  properties 
of  the  regular  languages. 


EXAMPLE  8.16  Sometimes  We  Must  Use  the  Closure  Theorems 

Let  L  =  {a'  by  ck :  i, ;,  k  s  0  and  (if  i  =  1  then  j  =  fc)}.  Every  string  of  length  at 
least  1  that  is  in  L  is  pumpable.  It  is  easier  to  see  this  if  we  rewrite  the  final  condi¬ 
tion  as  (t  #  1 )  or  ( j  =  k).  Then  we  observe: 

•  If  /  =  0  then:  If  j  ^  0,  let  y  be  b;  otherwise,  let  y  be  c.  Pump  in  or  out.  Then  i 
will  still  be  0  and  thus  not  equal  to  1,  so  the  resulting  string  is  in  L. 

•  If  /  =  1  then:  Let  y  be  a.  Pump  in  or  out.  Then  i  will  no  longer  equal  1,  so  the 
resulting  string  is  in  L. 

•  If  /  =  2  then:  Let  y  be  aa.  Pump  in  or  out.  Then  /  cannot  equal  1,  so  the  result¬ 
ing  string  is  in  L. 

•  If  /  >  2  then:  Let  y  be  a.  Pump  out  once  or  in  any  number  of  times. Then  /  can¬ 
not  equal  1.  so  the  resulting  string  is  in  L. 

But  L  is  not  regular.  One  way  to  prove  this  is  to  use  the  fact  that  the  regular  lan¬ 
guages  are  closed  under  intersection.  So,  if  L  were  regular,  then  L'  —  LC \  ab*c* 
=  {  ah'  c k  :j,k  >0  and  j  =  k)  would  also  be  regular.  But  it  is  not,  which  we  can 
show  using  the  Pumping  Theorem.  Let  w  =  ab*cfc.  Then  y  must  occur  in  the  first 
k  characters  of  w.  If  y  includes  the  initial  a.  pump  in  once. The  resulting  string  is  not 
in  L'  because  it  contains  more  than  one  a.  If  y  does  not  include  the  initial  a,  then  it 
must  be  b'\  where  0  <  p  <  k.  Pump  in  once.  The  resulting  string  is  not  in  L'  be¬ 
cause  it  contains  more  b’s  than  c’s.  Since  U  is  not  regular,  neither  is  L. 
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EXAMPLE  8.16  (Continued) 

Another  way  to  show  that  L  is  not  regular  is  to  use  the  fact  that  the  regular  lan¬ 
guages  are  closed  under  reverse.  LR  =  {c*  b'  a' :  i,  j.  k  »  0  and  (if  i  =  1  then 
j  =  fc)}.  If  L  were  regular  then  LR  would  also  be  regular.  But  it  is  not,  which  we 
can  show  using  the  Pumping  Theorem.  Let  w  =  c*  bA  a.  v  must  occur  in  the  first 
k  characters  of  w,  so  y  =  cp,  where  0  <  p  <  k.  Set  q  to  0.  The  resulting  string 
contains  a  single  a,  so  the  number  of  b’s  and  c's  must  he  equal  for  it  to  be  in  LR. 
But  there  are  fewer  c’s  than  b’s.  So  the  resulting  string  is  not  in  L K.  IR  is  not  reg¬ 
ular.  Since  Ln  is  not  regular,  neither  is  L. 

8.5  Exploiting  Problem-Specific  Knowledge 

Given  some  new  language  L.  the  theory  that  we  have  been  describing  provides  the 
skeleton  for  an  analysis  of  L.  If  L  is  simple,  that  may  be  enough.  But  if  L  is  based  on  a 
real  problem,  any  analysis  of  it  will  also  depend  on  knowledge  of  the  task  domain.  We 
got  a  hint  of  this  in  Example  8.13,  where  we  had  to  use  some  knowledge  about  num¬ 
bers  and  algebra.  Other  problems  also  require  mathematical  facts. 

EXAMPLE  8.17  The  Octal  Representation  of  a  Number  Divisible  by  7 

Let  L  =  {we  {0, 1, 2, 3. 4  ,5, 6, 7}* :  ti*  is  the  octal  representation  of  a  nonnega¬ 
tive  integer  that  is  divisible  by  7}.  The  first  several  strings  in  L  are:  0. 7, 16. 25. 34, 
43, 52,  and  61.  Is  L  regular?  Yes,  because  there  is  a  simple.  7-statc  DFSM  M  that 
accepts  L.The  structure  of  M  takes  advantage  of  the  fact  that  w  is  in  L  iff  the  sum 
of  its  digits,  viewed  as  numbers,  is  divisible  by  7.  So  the  states  of  M  correspond  to 
the  modulo  7  sum  of  the  digits  so  far.  We  omit  the  details. 


Sometimes  L  corresponds  to  a  problem  from  a  domain  other  than  mathematics,  in 
which  case  facts  from  that  domain  will  be  important. 

EXAMPLE  8.18  A  Music  Language 

Let  2  =  { .,  J,,  ,#\.\.*}.Let  L  =  {w :  w  represents  a  song  written  in  4/4  lime}.  L  is 
regular.  It  can  be  accepted  by  an  FSM  that  checks  for  4  beats  between  measure 
bars,  where  .counts  as  4,, counts  as  2,, counts  as  1,;  counts  as  1 2.  ,s  counts  as 1 4.  and 
}  counts  as  1/8. 


Other  techniques  described  in  this  book  can  also  be  applied  to  the  language 
of  music.  (N.l) 
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EXAMPLE  8.19  English 

Is  English  a  regular  language?  If  we  assume  that  there  is  a  longest  sentence,  then 
English  is  regular  because  it  is  finite.  If  we  assume  that  there  is  not  a  longest  sen¬ 
tence  and  that  the  recursive  constructs  in  English  can  be  arbitrarily  nested,  then  it 
is  easy  to  show  that  English  is  not  regular.  We  consider  a  very  small  subset  of  Eng¬ 
lish,  sentences  such  as. 

•  The  rat  ran. 

•  The  rat  that  the  cat  saw  ran. 

•  The  rat  that  the  cat  that  the  dog  chased  saw  ran. 

There  is  a  limit  on  how  deeply  nested  sentences  such  as  this  can  be  if  people 
are  going  to  be  able  to  understand  them  easily.  But  the  grammar  of  English  im¬ 
poses  no  hard  upper  bound.  So  we  must  allow  any  number  of  embedded  sen¬ 
tences.  Let  A  =  {cat,  rat, dog, bird, bug, pony}  and  let  V  =  {ran, saw, chased, 
flew,  sang,  frolicked}.  If  English  were  regular,  then  L  =  English  fl  {The  A 
(that  the  A)*V*V }  would  also  be  regular.  But  every  English  sentence  of  this 
form  has  the  same  number  of  nouns  as  verbs.  So  we  have  that: 

L  =  {The  A  (that  the  A)n  VnV,n  ^  0}. 

We  can  show  that  L  is  not  regular  by  pumping.  The  outline  of  the  proof  is  the 
same  as  the  one  we  used  in  Example  8.9  to  show  that  AnBn  is  not  regular.  Let 
w  =  The  cat  (that  the  rat )*  saw*  ran.  y  must  occur  within  the  first  k  charac¬ 
ters  of  w.  If  y  is  anything  other  than  (the  A  that)*,  or  {A  that  the/,  or  (that 
the  /!/,  for  some  nonzero  p ,  pump  in  once  and  the  resulting  string  will  not  be  of 
the  correct  form.  If  y  is  equal  to  one  of  those  strings,  pump  in  once  and  the  num¬ 
ber  of  nouns  will  no  longer  equal  the  number  of  verbs.  In  either  case  the  resulting 
string  is  not  in  L.  So  English  is  not  regular. 


Is  there  a  longest  English  sentence?  Are  there  other  ways  of  showing  that 
English  isn’t  regular?  Would  it  be  useful  to  describe  English  as  a  regular  lan¬ 
guage  even  if  we  could?  (L.3.1) 


8.6  Functions  on  Regular  Languages 

In  Section  8.3.  we  considered  some  important  functions  that  can  be  applied  to  the  reg¬ 
ular  languages  and  we  showed  that  the  class  of  regular  languages  is  closed  under 
them.  In  this  section,  we  will  look  at  some  additional  functions  and  ask  whether  the 
regular  languages  are  closed  under  them.  In  some  cases,  we  will  see  that  the  answer  is 
yes.  We  will  prove  that  the  answer  is  yes  by  showing  a  construction  that  builds  one 
FSM  from  another.  In  other  cases,  we  will  see  that  the  answer  is  no,  which  we  now 
have  the  tools  to  prove. 
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EXAMPLE  8.20  The  Function  firstchars 

Consider  again  the  function  firstdutrs.  which  we  defined  in  Example  4.1 1.  Recall 
that  firstchars(L)  =  { w  :  3y e  L  (y  =  cx.ce  ♦.  and  iree*)}.  In  other 

words,  tocompule  firstchars(L),  we  find  all  the  characters  that  can  be  initial  char¬ 
acters  of  some  string  in  L.  For  each  such  character  c ,  c*  C  firstchu rs{ L). 

ITie  regular  languages  are  closed  under  firstduirs.ll\c  proof  is  by  construction. 

If  L  is  a  regular  language,  then  there  exists  some  DFSM  M  -  ( K.  X,  A.  s.  A)  that 
accepts  L.  We  construct,  from  M ,  a  new  DFSM  M'  -  (A.".  ^.rS'.x’./V)  that  ac¬ 
cepts  firstcluirs( L ). The  algorithm  to  construct  A/'  is: 

1.  Mark  all  the  stales  in  M  from  which  there  exists  sonic  path  to  some  accept¬ 
ing  state. 

/*  Find  all  the  characters  that  are  initial  characters  in  some  string  in  L. 

2.  clist  =  0. 

3.  For  each  character  c  in  2  do: 

If  there  is  a  transition  from  s.  with  label  c.  to  some  state  </.  and  //  was 
marked  in  step  I  then: 

clist  =  clist  U  {<-■}. 

I*  Build  M\ 

4.  If  clist  =  0  then  construct  M'  with  a  single  state  s' .  which  is  not  accepting. 

5.  Else  do: 

Create  a  start  state  s '  and  make  it  the  first  stale  in  A'. 

For  each  character  c  in  clist  do: 

Create  a  new  state  q,  and  add  it  to  A'. 

Add  a  transition  from  s'  to  q,  labeled  c. 

Add  a  transition  from  q,  to  </,  labeled  c. 

M'  accepts  exactly  the  strings  in  firsiclutrs(  /.).  so  prstchnrsd)  is  regular. 

We  can  also  prove  that  Jirstcliurs(L)  must  he  regular  by  show  ing  how  to  construct 

a  regular  expression  that  describes  it.  We  begin  by  computing  clist  =  { c,.  Cr . c„} 

as  described  above. Then  a  regular  expression  that  describes  tirslcluirxlL)  is: 


c,*Uc2*U  U c„*. 


The  algorithm  that  we  just  presented  constructs  one  program  (a  DFSM).  using  an¬ 
other  program  (another  DFSM)  as  a  starting  point. The  algorithm  is  straightforward. 
We  have  omitted  a  detailed  proof  of  its  correctness,  but  that  proof  is  also  straightfor¬ 
ward.  Suppose  that,  instead  of  representing  an  input  language  /.  as  a  DFSM,  we  had 
represented  it  as  an  arbitrary  program  (written  in  C++  or  Java  or  whatever)  that  ac¬ 
cepted  it.  It  would  not  have  been  as  straightforward  to  have  designed  a  corresponding 
algorithm  to  convert  that  program  into  one  that  accepted  lirsnlnirs(L).  We  have  just 
seen  another  advantage  of  the  FSM  formalism. 
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EXAMPLE  8.21  The  Function  chop 

Consider  again  the  function  chop. defined  in  Example  4.10.  Chop(L)  =  {w  :  3atg 
L(x  =  jC|C.v2,  A]  e  S/ *,  .v2e  2/^*, cg  S;,  |.V)|  =  |x2|,  and  w  =  x1a2)}.  In  other 
words,  cliop(L)  is  all  the  odd  length  strings  in  L  with  their  middle  character 
chopped  out. 

The  regular  languages  are  not  closed  under  chop.  To  show  this,  it  suffices  to 
show  one  counterexample,  i.e.,  one  regular  language  L  such  that  chop(L)  is  not 
regular.  Let  L  =  a*db*.  L  is  regular  since  it  can  be  described  with  a  regular  ex¬ 
pression. 

What  is  c7iop(a*db*)?  Let  w  be  some  string  in  a*db*.  Now  we  observe: 

•  If  |  t<’|  is  even,  then  there  is  no  middle  character  to  chop  so  w  contributes  no 
siring  to  chop  (a*db*). 

•  If  |w|  is  odd  and  w  has  an  equal  number  of  a's  and  b's,  then  its  middle  charac¬ 
ter  is  d.  Chopping  out  the  d  produces,  and  contributes  to  cJiop(a*db*),  a  string 
in  \  a"  b"  :  n  s  0}. 

•  If  |  w>|  is  odd  and  w  does  not  have  an  equal  number  of  a's  and  b's,  then  its  mid¬ 
dle  character  is  not  d.  Chopping  out  the  middle  character  produces  a  string 
that  still  contains  one  d.  Also  note  that,  since  |w|  is  odd  and  the  number  of  a  s 
differs  from  the  number  of  b’s,  it  must  differ  by  at  least  two.  So,  when  w s  mid¬ 
dle  character  is  chopped  out,  the  resulting  string  will  still  have  different  num¬ 
bers  of  a's  and  b's. 

So  chop( a*db*)  contains  all  strings  in  {a"  b":«  >  0}  plus  some  strings  in 
{mig  a*db*  :  lie!  is  even  and  #4(u>)  ^  #b(M.’)}.  We  can  now  show  that  chop(z* 
db*)  is  not  regular.  If  it  were,  then  the  language  L'  =  chop( a*db*)  O  a*b*, 
would  also  be  regular  since  the  regular  languages  are  closed  under  intersection. 
But  L'  =  { a"b"  :  n  ^  0}.  which  we  have  already  shown  is  not  regular.  So  neither 
is  c/?o/>(a*db*).  Since  there  exists  at  least  one  regular  language  L  with  the  prop¬ 
erty  that  chop(L)  is  not  regular,  the  regular  languages  are  not  closed  under  chop. 


EXAMPLE  8.22  The  Function  maxstring 

Define  nwxs(ring(L)  =  {w'weL  and  Vze^iz  *  e-*wz<£L)}.  In  other 
words,  maxstring(L)  contains  exactly  those  strings  in  L  that  cannot  be  extended 
on  the  right  and  still  be  in  L.  Let's  look  at  maxstring  applied  to  some  languages: 


L  maxstring(L) 

0  0 

a*b*  0 

ab*a  ab*a 

a*b*a  a*b'a 
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EXAMPLE  8.23  The  Function  mix 

Define mix(L)  =  { to  •  lt.y,z(xe  L .  x  =  yz.lyl  =  Id.  to  =  yz*)}.  Inotherwords. 
mix(L)  contains  exactly  those  strings  that  can  he  formed  by  taking  some  even 
length  string  in  L  and  reversing  its  second  half.  Let's  look  at  mix  applied  to  some 
languages: 


L 

mix(L) 

0 

(a  U  b)* 

((a  U  b)(a  U  b))* 

(ab)* 

{(ab):"+l :  ti  2  ()}  U  {(ab)"(ba)":ti  2  ()} 

(ab)*a(ab)* 

0 

The  regular  languages  are  closed  under  maxstring.  They  are  not  closed  under  mix. 
We  leave  the  proof  of  these  claims  as  an  exercise. 

Exercises 

1.  For  each  of  the  following  languages  L.  stale  whether  L  is  regular  or  not  and 
prove  your  answer: 

a.  {aV :  i,j  s  0  and  /  +  /  =  5}. 

b.  {aV :  ij  0  and  /  -  j  =  5}. 

c.  {a'by : /, j  s  0  and  \i  -  j\  0 } • 

d.  {toe  {0,1,#}*  :  to  =  x  #y.  where  x,  ye  |0,1}*  and  |.\|  •  |y| 

e.  {a'b':0  ^  /  <j  <  2000}. 

f.  { toe  {Y,N}*  :  to  contains  at  least  two  Ys  and  at  most  two  Ns}. 

g.  {to  =  xy :  -v,  ye  {a,  b}*  and  |.v|  =  |y|  and  #a(.v)  2  #*(v)}. 

h.  { to  =  xyzy*  x :  .v.  y.  z  e  { a,  b }  * } . 

i.  {to  =  jryzy :  y. ze  {0, 1}’}. 

j.  {toe  {0,1}*  :#„(«>)  *#i(»r)}. 

k.  {toe  {a,  b}*  :  to  =  to*}. 

l.  {toe  {a.b}*:  3jre  {a.b}*  (to  =  .v.vR.r)}. 

m.  {toe  {a.b}*  :  the  number  of  occurrences  of  the  substring  ab  equals  the  num 
ber  of  occurrences  of  the  substring  ba}. 

n.  { to  e  { a.  b  }* :  to  contains  exactly  two  more  b's  than  a  s } . 

o.  {toe  { a. b } * :  to  =  xyz.  Ul  =  |y|  =  Id.  and  z  =  x  with  every  a  replaced  by 
b  and  every  b  replaced  by  a}.  Example:  abbbabbaa  e  L,  with  .v  — 
abb,  v  =  bab.  and  z  =  baa. 

p.  {to:  toe  {a  -  z}*  and  the  letters  of  to  appear  in  reverse  alphabetical  order}. 
For  example,  spoonfeed  e  L. 
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q.  {«>’•  if  e  {a  -  2}*  every  letter  in  w  appears  at  least  twice}.  For  example, 
unprosperousness  eL 

r.  { w  ’  tv  is  the  decimal  encoding  of  a  natural  number  in  which  the  digits  appear 
in  a  non-decreasing  order  without  leading  zeros}. 

s.  {if  of  the  form:  <integer\>  +  <integer2>  =  <integer^>,  where  each  of  the 
substrings  <integerx> ,  <integer2> ,  and  <integer3>  is  an  element  of  {0  — 
9}*  and  integer 3  is  the  sum  of  integer |  and  integer, } .  For  example. 
124+5-129  e  L. 

I.  where  L0  =  {ba'  b'  a k,j  >  0, 0  s  /  <  /c}. 

u.  {if :  if  is  the  encoding  of  a  date  that  occurs  in  a  year  that  is  a  prime  number}. 

A  date  will  be  encoded  as  a  string  of  the  form  mmldd/yyyy ,  where  each  m,  d, 
and  y  is  drawn  from  {0-9}. 

v.  {if  e  {1}* :  if  is,  for  some  n  2:  1,  the  unary  encoding  of  10"}.  (So  L  = 

{ 1111111111, lm,  l11100, .,.} .) 

2.  For  each  of  the  following  languages  L,  state  whether  L  is  regular  or  not  and  prove 
your  answer: 

a.  { if  e  { a,  b,  c }  * :  in  each  prefix  x  of  if,  #a(x)  =  #b(.r )  =  #c(.r)) } . 

b.  {if  e  {a, b. c}*  :  3  some  prefix  x  of  w  (#a(.r)  =  #b(r)  =  #CW)}- 

c.  {iee  {a,  b,  c}*  :  3  some  prefix  x  of  if  (x  #  e  and  #a(x)  =  #b(jr)  =  #c(x))}. 

3.  Define  the  following  two  languages: 

La  =  {if  e  {a,  b}*  :  in  each  prefix  x  of  if.  #a(.v)  2  #b(.t)}. 

Lh  =  { if  e  { a,  b }  *  :  in  each  prefix  x  of  if,  #b(.t)  2  #a(x) }. 

a.  Let  Lj  =  La  fl  Lb.  Is  L\  regular?  Prove  your  answer. 

b.  Let  L2  =  La  U  Lh.  Is  L,  regular?  Prove  your  answer. 

4.  For  each  of  the  following  languages  L,  state  whether  L  is  regular  or  not  and  prove 
your  answer: 

a.  {nififRf :  it,  v,  if  e  {a. b}+}. 

b.  { .v  vz.v  R.v :  x.  y.  z  e  { a,  b } + } . 

5.  Use  the  Pumping  Theorem  to  complete  the  proof,  given  in  L.3.1,  that  English 
isn't  regular. 

6.  Prove  by  construction  that  the  regular  languages  are  closed  under: 

a.  intersection. 

b.  set  difference. 

7.  Prove  that  the  regular  languages  are  closed  under  each  of  the  following  operations: 

a.  pref(L)  =  {if :  3x eP(«'jtei.)}. 

b.  stiff (L)  =  {w :  3xe  S*(.vw e  L)}. 

c.  reverse(L)  =  {.t  £  2*  :  x  =  ifR  for  some  weL). 

d.  letter  substitution  (as  defined  in  Section  8.3). 

8.  Using  the  defintions  of  maxstring  and  mix  given  in  Section  8.6.  give  a  precise  def¬ 
inition  of  each  of  the  following  languages: 
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a.  maxstring(K  B"). 

b.  maxstring(a'  b'  c*.  1  s  ^  <  /). 

c.  maxstring(L\L2),  where  Lx  =  {we  {a,  b}*  :  u>  contains  exactly  one  a}  and 
L2  =  {a}. 

d.  m/jr((aba)*). 

e.  mix(a*b*). 

9.  Prove  that  the  regular  languages  are  not  closed  under  mix. 

10.  Recall  that  maxstring(L)  =  { w :  weL  and  Vr  e  2*(z  *  e  — ►  wz  «  L) }. 

a.  Prove  that  the  regular  languages  are  closed  under  maxstring. 

b.  If  maxstring(L )  is  regular,  must  L  also  be  regular?  Prove  your  answer. 

11.  Define  the  function  midchar(L)  =  {c:3tre  L(w  =  ycz.ceS,.,  yeZL*,ze 

|y|  =  |z|)}.  Answer  each  of  the  following  questions  und  prove  your  answer. 

a.  Are  the  regular  languages  closed  under  midchur‘1 

b.  Are  the  nonregular  languages  closed  under  midchar'! 

12.  Define  the  function  twice(L)  -  {w'3xeL  (x  can  be  written  as  ctc2...c„,  for 
some  /i  2:  1,  where  each  c,  e  2 L ,  and  w  =  C|C,c2c2  •  •  •  c,f„) }. 

a.  Let  L  —  (1  U  0)*1.  Write  a  regular  expression  for  twice(L). 

b.  Are  the  regular  languages  closed  under  twice'}  Prove  your  answer. 

13.  Define  the  function  shuffle(L)  =  {w’-SxbL  (w  is  some  permutation  of x)}.  For 
example,  if  L  =  {ab,  abc),  then  shuffle (L)  =  {ab.  abc.  ba.  acb.  bac,  bca,  cab, 
cba}.  Are  the  regular  languages  closed  under  shuffle ?  Prove  your  answer. 

14.  Define  the  function  copyandreverse(L)  =  { w :  B.r  e  L{w  =  xx  R)}.  Are  the  reg¬ 
ular  languages  closed  under  copyandreverse ?  Prove  your  answer. 

15.  Let  L|  and  L2  be  regular  languages.  Let  L  be  the  language  consisting  of  strings 
that  are  contained  in  exactly  one  of  Lx  and  L2.  Prove  that  L  is  regular. 

16.  Define  two  integers  i  and  j  to  be  twin  primes  Q  iff  both  i  and  j  are  prime  and 

I j  -  il  =  2- 

a.  Let  L  =  {we  {1}* :  w  is  the  unary  notation  for  a  natural  number  it  such 
that  there  exists  a  pair  p  and  q  of  twin  primes,  both  >  n. }  Is  L  regular? 

b.  Let  L  =  {jc,  y  :x  is  the  decimal  encoding  of  a  positive  integer  /.  y  is  the  deci¬ 
mal  encoding  of  a  positive  integer  j,  and  i  and  j  are  twin  primes } .  Is  L  regular? 

17.  Consider  any  function/ (L|)  =  L2,  where  L\  and  L2  are  both  languages  over  the 
alphabet  2  =  {0, 1}.  A  function /is  nice  iff  whenever  Li  is  regular,  L\  is  regular. 
For  each  of  the  following  functions,  f  state  whether  or  not  it  is  nice  and  prove 
your  answer. 

a.  f(L)  =  Lr. 

b.  / ( L )  =  { w  :  w  is  formed  by  taking  a  string  in  L  and  replacing  all  l's  with  0’s 
and  leaving  the  0's  unchanged }. 

c.  f(L)  =  L  U  0*. 

d.  /  (L)  =  {w'w  is  formed  by  taking  a  string  in  L  and  replacing  all  l's  with  0's 
and  all  0's  with  l's  (simultaneously)}. 
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e.  f(L)  =  {w :  3x  e  L  (w  =  *00)}. 

f.  f(L)  =  {w‘  w  is  formed  by  taking  a  string  in  L  and  removing  the  last 
character}. 

18.  We’ll  say  that  a  language  L  over  an  alphabet  2  is  splitable  iff  the  following  prop¬ 
erly  holds:  Let  w  be  any  string  in  L  that  can  be  written  as  c2c2 . . .  C2„,  for  some 
n  ^  1,  where  each  c,  e  2. Then  x  =  c(c3 . . .  c^-i  is  also  in  L. 

a.  Give  an  example  of  a  splitable  regular  language. 

b.  Is  every  regular  language  splitable? 

c.  Does  there  exist  a  nonregular  language  that  is  splitable? 

19.  Define  the  class  1R  to  be  the  class  of  languages  that  are  both  infinite  and  regular. 
Tell  whether  the  class  1R  closed  under: 

a.  union. 

b.  intersection. 

c.  Kleene  star. 

20.  Consider  the  language  L  =  {*0"y  1  nz:n  ^  0 ,xeP>ye.Q,zeR,  where  P,  Q , 
and  R  are  nonempty  sets  over  the  alphabet  {0, 1}}.  Can  you  find  regular  sets  P, 
Q,  and  R  such  that  L  is  not  regular?  Can  you  find  regular  sets  P,  Q,  and  R  such 
that  L  is  regular? 

21.  For  each  of  the  following  claims,  state  whether  it  is  True  or  False.  Prove  your 
answer. 

a.  There  are  uncountably  many  non -regular  languages  over  2  =  {a,  b}. 

b.  The  union  of  an  infinite  number  of  regular  languages  must  be  regular. 

c.  The  union  of  an  infinite  number  of  regular  languages  is  never  regular. 

d.  If  L\  and  L2  are  not  regular  languages,  then  L\  U  L2  is  not  regular. 

e.  If  L\  and  L2  are  regular  languages,  then  Li<2)L2-  {t v'we(Li~  L2 )  or 
toe(L2  -  Li)}  is  regular. 

f.  If  L\  and  L2  are  regular  languages  and  LXCLC  L2,  then  L  must  be  regular. 

g.  The  intersection  of  a  regular  language  and  a  nonregular  language  must  be 
regular. 

h.  The  intersection  of  a  regular  language  and  a  nonregular  language  must  not  be 
regular. 

i.  The  intersection  of  two  nonregular  languages  must  not  be  regular. 

j.  The  intersection  of  a  finite  number  of  nonregular  languages  must  not  be 
regular. 

k.  The  intersection  of  an  infinite  number  of  regular  languages  must  be  regular. 
L  It  is  possible  that  the  concatenation  of  two  nonregular  languages  is  regular. 

m.  It  is  possible  that  the  union  of  a  regular  language  and  a  nonregular  language 
is  regular. 

n.  Every  nonregular  language  can  be  described  as  the  intersection  of  an  infinite 
number  of  regular  languages. 

o.  If  L  is  a  language  that  is  not  regular,  then  L*  is  not  regular. 
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p.  If  L*  is  regular,  then  L  is  regular. 

q.  The  nonregular  languages  are  closed  under  intersection. 

r.  Every  subset  of  a  regular  language  is  regular. 

s.  Let  La  =  L\L->Ly  If  L\  and  L2  are  regular  and  L\  is  not  regular,  it  is  possible 
that  L4  is  regular. 

t.  If  L  is  regular,  then  so  is  {.vy  :x  e  L  and  y «?/.[. 

u.  Every  infinite  regular  language  properly  contains  another  infinite  regular 
language. 


CHAPTER  9 


Algorithms  and  Decision 
Procedures  for  Regular  Languages 

S" — — ~ — 

1.  FSMs  and  regular  expressions  are  useful  design  tools. 

2.  The  fact  that  nondeterminism  can  be  “compiled  out”  of  an  FSM  makes  it  even 
easier,  for  many  kinds  of  tasks,  to  design  a  simple  machine  that  can  relatively  eas¬ 
ily  be  shown  to  be  correct. 

3.  DFSMs  run  in  lime  that  is  linear  in  the  length  of  the  input. 

4.  There  exists  an  algorithm  to  minimize  a  DFSM. 

5.  The  regular  languages  are  closed  under  many  useful  operators,  so  we  can  talk 
about  programs  that  manipulate  FSMs  to  construct  new  ones. 

And  now  we  will  consider  a  sixth: 

6.  There  exist  decision  procedures  for  many  questions  that  we  would  like  to  ask 
about  FSMs  and  regular  expressions. 


9.1  Fundamental  Decision  Procedures 

Recall  from  Section  4.1  that  a  decision  procedure  is  an  algorithm  whose  result  is  a 
Boolean  value.  A  decision  procedure  must  be  guaranteed  to  halt  on  all  inputs  and  to 
return  the  correct  value. 

In  this  section,  we  describe  some  of  the  most  useful  decision  procedures  for  regular 
languages: 
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9.1.1  Membership 

Given  an  FSM  M  and  a  string  w.  does  M  accept  »r?  This  is  the  most  basic  question 
we  can  ask  about  an  FSM.  it  can  be  answered  by  running  A7  on  if, provided  that  we 
do  so  in  a  fashion  that  guarantees  that  the  simulation  halts.  Recall  that  the  simula¬ 
tion  of  an  NDFSM  M  might  not  halt  if  M  contains  K-loops  that  are  not  handled 
properly  by  the  simulator. 

EXAMPLE  9.1  e-Loops  Can  Cause  Trouble  in  NDFSMs 

If  we  are  not  careful,  the  simulation  of  the  following  NDFSM  on  input  aa  might  get 
stuck  chasing  the  e-loop  between  r/0  and  r/|.  never  reading  any  input  characters: 


THEOREM  9.1  Decidability  of  Regular  Languages 

Theorem:  Given  a  regular  language  L  (represented  as  an  FSM  or  a  regular  expres¬ 
sion  or  a  regular  grammar)  and  a  siring  w,  there  exists  a  decision  procedure  that 
answers  the  question,  is  w  e  L? 

Proof:  If  L  is  represented  as  an  FSM,  we  can  answer  the  question  using  either  of  the 
simulation  techniques  described  in  Section  5.6.  We'll  choose  to  use  inlfsntsinmlaie : 

decideFSM  (M:  FSM,  W-  string)  - 

If  mlfstminntlate( M,  it')  accepts  then  return  True  else  return  False. 

Any  question  that  can  be  answered  ahoul  an  FSM  can  be  answered  about  a 
regular  expression  by  first  converting  the  regular  expression  into  an  FSM.  So  if  L 
is  represented  as  a  regular  expression  «,  we  can  answer  the  question,  “Does  a 
generate  wT'  using  the  procedure  decideregex  defined  as  follows: 

decideregex  (a:  regular  expression,  wl  siring )  - 

1.  From  «.  use  regextofsm  to  construct  an  FSM  M  such  that  L  («)  -  L  (M). 

2.  Return  decide  FSM(M.  tv). 

The  same  is  true  of  regular  grammars:  Any  regular  grammar  G  can  be  converted 
to  an  FSM  that  accepts  L(G). 

While  the  solution  to  this  problem  was  simple,  the  question  itself  is  very  important. 

We  will  see  later  that,  in  the  case  of  some  more  powerful  computational  models  (in 
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particular  the  Turing  machine),  the  basic  membership  question  is  not  decidable.  This 
tact  is  vet  another  powerful  argument  for  the  use  of  an  FSM  whenever  one  exists. 

In  the  remainder  of  this  discussion,  we  will  focus  on  answering  questions  about 
FSMs.  Each  question  that  is  decidable  for  FSMs  is  also  decidable  for  regular  expres¬ 
sions  and  for  regular  grammars  because  a  regular  expression  or  a  regular  grammar  can 
be  converted  to  an  equivalent  FSM. 

9.1 .2  Emptiness  and  Totality 

The  next  question  we  will  consider  is.  “Given  an  FSM  M .  is  L  ( M )  =  0?"  There  are 
two  approaches  that  we  could  take  to  answering  a  question  like  this  about  the  overall 
behavior  of  M: 

1.  View  M  as  a  directed  graph  in  which  the  states  are  the  vertices  and  the  transitions 
are  directed  edges.  Find  some  property  of  the  graph  that  corresponds  to  the  situ¬ 
ation  in  which  L(A7)  =  0. 

2.  Run  M  on  some  number  of  strings  and  observe  its  behavior. 

Both  work.  Let’s  consider  the  first  approach  in  which  we  do  a  static  analysis  of  M, 
without  running  it  on  any  strings.  We  observe  that  L(M)  will  be  empty  if  KM  contains  no 
accepting  states.  But  then  we  realize  that,  for  L{M)  not  to  be  empty.it  is  not  sufficient  for 
there  to  be  at  least  one  accepting  stale. That  slate  must  be  reachable,  via  some  path,  from 
the  start  stale.  So  we  can  slate  the  following  algorithm  for  testing  whether  L(M)  =  0: 

empty  FSM  graph  {M:  FSM)  = 

1.  Mark  all  slates  that  are  reachable  via  some  path  from  the  start  stale  of  M. 

2.  If  at  least  one  marked  state  is  an  accepting  stale,  return  False.  Else  return  True. 

Another  way  to  use  the  graph-testing  method  is  to  exploit  the  fact  there  exists  a 
canonical  form  for  FSMs.  Recall  that,  in  Section  5.H,  we  described  the  algorithm 
biiildFSMcanonicalform,  which  built,  from  any  FSM  M.  an  equivalent  unique  minimal 
DFSM  whose  states  arc  named  in  a  standard  way  so  that  all  equivalent  FSMs  will  gen¬ 
erate  the  same  minimal  deterministic  machine.  We  can  use  that  canonical  form  as  the 
basis  for  a  simple  emptiness  checker,  since  we  note  that  L(M)  is  empty  iff  the  canoni¬ 
cal  form  of  M  is  the  one-state  FSM  that  accepts  nothing.  So  we  can  define: 

empty FS M canonicalgrapli  ( M :  FSM)  = 

1.  Let  M4  =  build  FSM  canonical  form  (M). 

2.  If  Mi  is  the  one-state  FSM  that  accepts  nothing,  return  Tme.  Else  return  False. 

The  second,  very  different,  approach  to  answering  the  emptiness  question  is  to  run 
M  on  some  strings  and  see  whether  or  not  it  accepts.  We  might  start  by  running  M  on 
all  strings  in  I*  to  see  if  it  accepts  any  of  them.  But  there  is  an  infinite  number  of  pos¬ 
sible  strings  (assuming  that  £  ^  is  not  empty).  A  decision  procedure  must  be  guaran¬ 
teed  to  hall  in  a  finite  number  of  steps,  even  if  the  answer  is  False.  But  we  make  the 
same  observation  here  that  we  used  as  the  basis  for  the  Pumping  Theorem:  If  a  DFSM 
M  accepts  any  “long"  strings,  then  it  also  accepts  the  strings  that  result  from  pumping 
out  from  those  lone  strinps  tb<»  ciiMnnw  j  *  *  * 
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if  a  DFSM  M  accepts  any  strings  of  length  greater  than  or  equal  to  Iff */|,  then  it  must 
also  accept  at  least  one  string  of  length  less  than  In  other  words,  it  must  accept  at 
least  one  string  without  going  through  any  loops.  So  we  can  define  emptyFSMsimulate: 

empty FSM simulate  (M:  FSM)  = 

1.  Let  M'  =  ndfsmtodfsm  (M). 

2.  For  each  string  w  in  2*  such  that  |w|  <  |/CAf|  do: 

Run  decide  FSM  (M' ,  w). 

3.  If  M'  accepts  at  least  one  such  string,  return  False:  else  return  True. 

This  definition  of  empty  FSM  simulate  exploits  a  powerful  technique  that  we'll  use  in 
other  decision  procedures.  We'll  call  it  bounded  simulation  It  answers  a  question 
about  L{M)  by  simulating  the  execution  of  M.  For  bounded  simulation  to  serve  as  the 
basis  of  a  decision  procedure,  two  things  must  be  true: 

•  The  simulation  of  M  on  a  particular  input  siring  must  be  guaranteed  to  halt. 
DFSMs  always  halt,  so  this  requirement  is  easily  met.  We'll  see  later,  however,  that 
when  wc  arc  considering  more  powerful  machines,  such  as  pushdown  automata  and 
Turing  machines,  this  condition  may  not  be  satisfied. 

•  It  must  be  possible  to  determine  the  answer  we  seek  by  simulating  M  on  some  finite 
number  strings.  So  we  need  to  be  able  to  do  an  analysis,  of  the  sort  we  did  above, 
that  shows  that  once  we  know  how  M  works  on  some  particular  finite  set  of  strings, 
we  can  conclude  some  more  general  property  of  its  behavior. 

The  algorithms  that  we  have  just  presented  enable  us  to  prove  the  following  theorem: 

THEOREM  9.2  Decidability  of  Emptiness  _ 

Theorem:  Given  an  FSM  M ,  there  exists  a  decision  procedure  that  answers  the  ques¬ 
tion,  is  L  (M)  =  0? 

Proof:  All  three  algorithms,  empty  FS  Mg  rap  h ,  empty  FSM  canonical  graph,  and 
emptyFSMsimulate.  can  easily  be  shown  to  be  correct.  We  can  pick  any  one  of 
them  and  use  it  to  define  the  procedure  empty  FSM.  We’ll  use  emptyFSMsimulate: 

emptyFSM  ( M :  FSM)  = 

Return  empty FSMsimulute(M), 

At  the  other  extreme,  we  might  like  to  ask  the  question,  “Given  an  FSM  M .  is 
L  ( M )  =  2*?”  In  other  words,  does  M  accept  everything?  'Hie  answer  is  yes  iff 
-i L  ( M )  =  0.  So  we  have  the  following  theorem: 

theorem  9.3  Decidability  of  Totality 

Theorem:  Given  an  FSM  M,  there  exists  a  decision  procedure  that  answers  the 
question,  is  L  ( M )  =  2*? 

Proof:  The  following  procedure  answers  the  question: 
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totalFSM  (Af:  FSM)  = 

1.  Construct  Af '  to  accept  ->L  (Af). 

2.  Return  emplyFSM  (AT). 


9.1.3  Finiteness 

Suppose  that  L(Af )  is  not  empty. Then  we  might  like  to  ask, “Is  L(M)  finite?”  Again,  we 
can  attempt  to  answer  the  question  either  by  analyzing  Af  as  a  graph  or  by  running  it 
on  strings. 

Let's  consider  the  graph  approach  first.  L(Af )  is  clearly  finite  if  Af  contains  no  loops. 
But  the  mere  presence  of  a  loop  does  not  guarantee  that  L(Af)  is  infinite.  The  loop 
might  be: 

•  labeled  only  with  e, 

•  unreachable  from  the  start  state,  or 

•  not  on  a  path  to  an  accepting  state. 

In  any  of  those  cases,  the  loop  will  not  force  Af  to  accept  an  infinite  number  of 
strings.  Taking  all  of  those  issues  into  account,  we  can  build  the  following  correct 
graph-based  algorithm  to  answer  the  question: 

fmiteFSMgraph  (A/:  FSM)  = 

1.  Af'  =  ndfsmtodfsm  (Af). 

2.  Af"  =  minDFSM  (Af').  I*  At  this  point,  there  are  no  e-transitions  and  no 

unreachable  states. 

3.  Mark  all  states  in  Af "  that  are  on  a  path  to  an  accepting  state. 

4.  Considering  only  marked  states,  determine  whether  there  are  any  cycles  in  Af 

5.  If  there  are  cycles,  return  False.  Else  return  True. 

While  it  is  possible,  as  we  have  just  seen,  to  design  algorithms  to  answer  questions 
about  FSMs  by  analyzing  them  as  graphs,  it  is  quite  easy  to  make  mistakes,  as  we  would 
have  done  had  we  not  considered  the  three  cases  in  which  a  loop  does  not  mean  that  an 
infinite  number  of  strings  can  be  accepted. 

It  is  often  easier  to  design  an  algorithm  and  prove  its  correctness  by  appealing  to 
the  simulation  strategy  instead.  Pursuing  that  approach,  it  may  be  tempting  to  try  to 
answer  the  finiteness  question  by  running  Af  on  all  possible  strings  to  see  if  it  ever 
stops  accepting.  But,  again,  we  can  only  use  simulation  in  a  decision  procedure  if  we 
can  put  an  upper  bound  on  the  amount  of  simulation  that  is  required.  Fortunately, 
we  can  do  that  in  this  case.  Again  we  appeal  to  the  argument  that  we  used  to  prove 
the  Pumping  Theorem.  We  begin  by  making  Af  deterministic  so  that  we  do  not  have 
to  worry  about  e-loops.  Then  observe  that  L(Af )  is  infinite  iff  it  contains  any  strings 
that  force  Af  through  some  loop.  Any  string  of  length  greater  than  must  force 
Af  through  a  loop.  So,  if  Af  accepts  even  one  string  of  length  greater  than  |KjkI,  then 
L(Af )  is  infinite.  Note  also  that  if  L(Af)  is  infinite  then  it  contains  no  longest  string. 
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So  it  must  contain  an  infinite  number  of  strings  of  length  greater  than  |K,y|.  So 
L(M)  is  infinite  iff  M  accepts  even  one  siring  of  length  greater  than  |£u|. 

Unfortunately,  there  is  an  infinite  number  of  such  long  strings.  So  we  cannot  try 
them  all.  But  suppose  that  M  accepts  some  “very  long”  string,  i.e..  one  that  forces  M 
through  a  loop  twice.  Then  we  could  pump  out  the  substring  that  corresponds  to  the 
first  time  through  the  loop.  We'd  then  have  a  shorter  siring  that  is  also  accepted  by  M. 
So  if  M  accepts  any  strings  that  force  it  through  a  loop  twice,  it  must  also  accept  at  least 
one  string  that  forces  it  through  a  loop  only  once.  The  longest  loop  M  could  contain 
would  be  one  that  drives  it  through  all  its  states  a  second  time.  So.  /.( M)  is  infinite  iff  M 
accepts  at  least  one  string  w  where: 

|Kyl  *  M  <  2*  \K\i\  -  1. 

We  can  now  define  a  simulation-based  procedure  to  determine  whether  L(M)  is 
finite: 

finite FSM simulate  (M:  FSM)  = 

1.  M'  =  ndfsmtodfsm  (M). 

2.  For  each  string  te  in  1*  such  that  |K,V'|  ^  M’ s  2*|Kw  I  ~  I  do 

Run  decide  FSM  ( M'.w ). 

3.  If  M'  accepts  at  least  one  such  siring,  return  False  (since  L  is  infinite  and  thus 

not  finite):  else  return  True. 

THEOREM  9.4  Decidability  of  Finiteness _ 

Theorem:  Given  an  FSM  M.  there  exists  a  decision  procedure  that  answers  the 
questions. “Is  L(  M)  finite?”  and  is  L(M)  infinite?" 

Proof:  We  can  pick  either  finite  FSM firn/di  or  finiteFSMsinwIute  and  use  it  to  define 
the  procedure  finite  ISM: 

finite  FSM  (M:  FSM)  = 

Return  finite FSMsmmlate{  M). 

Of  course,  if  we  can  decide  whether  L(M)  is  finite,  we  can  decide  whether  it  is 
infinite: 

infinite  FSM  (M:  FSM)  = 

Return  ^(JiniteFS M simulate! Ml). 


9.1.4  Equivalence 

Given  two  FSMs  M\  and  M2.  are  they  equivalent?  In  other  words,  is  /.  (A/,)  -  /.  (.Yf,)? 
Wc  can  describe  two  different  algorithms  for  answering  this  question. 
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The  first  algorithm  takes  advantage  of  the  existence  of  a  canonical  form  for  FSMs.  It 
works  as  follows: 

equalFSMsi  (M,:  FSM.  My  FSM)  = 

1.  M/  -  build  FS  M  canonical  form  (Mx). 

2.  M2'  =  buildFSMcanonicalform  (M2). 

3.  If  M\‘  and  M2  are  equal. return  True ,  else  return  False. 

The  second  algorithm  depends  on  the  following  observation:  Let  L\  and  L 2  be  the 
languages  accepted  by  M|  and  M2.  Then  M|  and  M2  are  equivalent  iff  (Lj  -  L2) 
U  (L2  —  L|)  =  0.  Since  the  regular  languages  are  closed  under  difference  and  union, 
we  can  build  an  FSM  to  accept  (L,  -  L2)U(L2  -  Lj).  We  can  then  test  to  see 
whether  that  FSM  accepts  any  strings.  So  we  have: 

equalFSMs,  (M,:  FSM,  My.  FSM)  = 

1.  Construct  MA  to  accept  L  (M\)  —  L  ( M2 ). 

2.  Construct  Mw  to  accept  L  (M2)  -  L  (M|). 

3.  Construct  M(-  to  accept  L  (M/t)  U  L  ( Mtt ). 

4.  Return  emplyFSM  (Mc). 

THEOREM  9.5  Decidability  of  Equivalence 

Theorem:  Given  two  FSMs  M\  and  M2,  there  exists  a  decision  procedure  that 
answers  the  question,  “Is  L  (M|)  =  L  (M2)?  ” 

Proof:  We  can  pick  the  approach  of  either  equalFSMs ,  or  equalFSMs2  and  use  it  to 
define  the  procedure  equalFSMs.  Choosing  equalFSMs2,  we  get: 

equalFSMs  (M,:  FSM,  My  FSM)  = 

Return  equalFSMs2  (Mj,  M2). 


9.1.5  Minimality 

THEOREM  9.6  Decidability  of  Minimality 

Theorem:  Given  a  DFSM  M,  there  exists  a  decision  procedure  that  answers  the 
question,  “Is  M  minimal?” 

Proof:  The  proof  is  by  construction.  We  define: 

minimalFSM  (M:  FSM)  = 

1.  M'  =  minDFSM  (M). 

2.  If  |/CJ  =  1**1  return  True:  else  return  False. 
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Note  that  it  is  easy  to  modify  mininuilTSM  so  that,  if  M  is  not  minimal,  it  returns 

iK.vl  —  I  K\r  | . 

9.1.6  Combining  the  Basics  to  Ask  Specific  Questions 

With  these  fundamental  decision  algorithms  in  hand. coupled  ssith  the  other  functions 
(such  as  luifsmiixifsm  und  min  D  FSM)  that  we  have  also  defined,  it  is  possible  to  an¬ 
swer  a  wide  range  of  specific  questions  that  might  he  ol  interest  in  a  particular  context. 


EXAMPLE  9.2  Combining  Algorithms  and  Decision  Procedures 

Suppose  that  we  would  like  to  know',  for  two  arbitrary  patterns,  whether  there  are 
any  nontrivial  (which  we  may  define,  for  example,  as  not  equal  to  b)  strings  that 
could  match  both  patterns. This  might  come  up  if  we  are  attempting  to  categorize 
strings  in  such  a  way  that  no  string  falls  into  more  than  one  category.  We  can  for¬ 
malize  that  question  as,  “Given  two  regular  expressions  «,  and  <r2,  is  ( L  («|)H 
L  ( ))  ~  {e}  *  0?"  An  algorithm  to  answer  that  question  is: 

1.  From  a|.  construct  an  FSM  A/,  such  that  /.  («,)  =  /.  (A/,). 

2.  From  a2,  construct  an  FSM  A 12  such  that  L  (<r2)  =  /.  (A/:). 

3.  Construct  M'  such  that  L  (M')  -  L  (Mt)  O  L  (A/>). 

4.  Construct  MK  such  that  L  (A/r)  =  {b}. 

5.  Construct  Af"  such  that  L  (M")  =  L(M')  -  L  (A/, ). 

6.  If  L  (A/")  is  empty  return  Rilxtr. else  return  True. 


9.2  Summary  of  Algorithms  and  Decision  Procedures  for 
Regular  Languages 

Sprinkled  throughout  our  discussion  of  regular  languages  has  been  a  collection  of  algo¬ 
rithms  that  can  be  applied  to  FSMs.  regular  expressions,  and  regular  grammars. Togeth¬ 
er.  those  algorithms  make  it  possible  to: 

•  optimize  FSMs. 

•  construct  new  FSMs  and  regular  expressions  from  existing  ones,  thus  enabling  us  to 
decompose  complex  problems  into  simpler  ones  and  to  reuse  code  that  has  already 
been  written,  and 

•  answer  a  wide  variety  of  questions  about  any  regular  language  or  about  the  class  of 
regular  languages. 
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Because  there  are  so  many  of  these  algorithms  and  they  have  been  spread  out  over 
several  chapters,  we  present  a  concise  list  of  them  here: 

•  Algorithms  that  operate  on  FSMs  without  altering  the  language  that  is  accepted: 

•  Ndjsmtodfsm:  Given  an  NDFSM  M,  construct  u  DFSM  M'  such  that  L  ( M ) 

=  L  {M'). 

•  MinDFSM:  Given  a  DFSM  M,  construct  a  minimal  DFSM  M',  such  that  L  ( M ) 

=  L  (AO 

•  Algorithms  that  compute  functions  of  languages  defined  as  FSMs: 

•  Given  two  FSMs  M\  and  M2,  construct  a  new  FSM  Aft  such  that  L  (Aft)  = 
L(M2)\JL(M[). 

•  Given  two  FSMs  M\  and  Af2.  construct  a  new  FSM  Aft  such  that 
I.  (Aft)  =  L  ( M2)L  (Af)(i.e„  the  concatenation  of  L  (M2)  and  L  (M\)). 

•  Given  an  FSM  Af.construct  a  new  FSM  M'  such  that  L  (AT)  =  (L  (M))*. 

•  Given  an  FSM  M.  construct  a  new  FSM  M'  such  that  L  ( M ')  =  ->L  ( M ). 

•  Given  two  FSMs  Af  and  M2 ,  construct  a  new  FSM  Mj  such  that  L  (M2)  = 
L(M2)OL(A/|). 

•  Given  two  FSMs  and  M2 ,  construct  a  new  FSM  My  such  that  L  (My)  = 
L(M2)  -  L  (Atf,). 

•  Given  an  FSM  M ,  construct  a  new  FSM  AT  such  that  L  (M')  =  (L  (M))R  (i.e., 
the  reverse  of  L(M)). 

•  Given  an  FSM  M.  construct  an  FSM  M'  that  accepts  leisub(L(M )),  where  leisub 
is  a  letter  substitution  function. 

•  Algorithms  that  convert  between  FSMs  and  regular  expressions: 

•  Given  a  regular  expression  a,  construct  an  FSM  M  such  that  L(a)  =  L  (M). 

•  Given  an  FSM  M ,  construct  a  regular  expression  a  such  that  L(a)  =  L  (M). 

•  Algorithms  that  convert  between  FSMs  and  regular  grammars: 

•  Given  a  regular  grammar  G.  construct  an  FSM  M  such  that  L  (G)  =  L  (Af ). 

•  Given  an  FSM  M,  construct  a  regular  grammar  G  such  that  L  ( G )  =  L  (M). 

•  Algorithms  that  implement  operations  on  languages  defined  by  regular  expres¬ 
sions  or  regular  grammars:  Any  operation  that  can  be  performed  on  languages  de¬ 
fined  by  FSMs  can  be  implemented  by  converting  all  regular  expressions  or 
regular  grammars  to  equivalent  FSMs  and  then  executing  the  appropriate  FSM 
algorithm. 

•  Decision  procedures  that  answer  questions  about  languages  defined  by  FSMs: 

•  Given  an  FSM  M  and  a  siring  w  ,  is  w  is  accepted  by  Ml 

•  Given  an  FSM  M,  is  L  (M)  -  01 
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•  Given  an  FSM  M,  is  L  (M)  =  2*? 

•  Given  an  FSM  M ,  is  L(M)  finite  (or  infinite)? 

•  Given  two  FSMs,  and  M2,  is  L  (Af, )  =  L  (M2)? 

•  Given  a  DFSM  M ,  is  M  minimal? 

•  Decision  procedures  that  answer  questions  about  languages  defined  by  regular 
expressions  or  regular  grammars:  Again,  convert  the  regular  expressions  or  regular 
grammars  to  FSMs  and  apply  the  FSM  algorithms. 

This  list  is  important  and  it  represents  a  strong  argument  for  describing  problems 
as  regular  languages  and  solutions  as  FSMs  or  regular  expressions.  As  we  will  soon 
see,  a  few  of  these  algorithms  (but  not  most)  exist  for  context-free  languages  and 
their  associated  representations  (as  pushdown  automata  or  as  context-free  gram¬ 
mars).  None  of  them  exists  for  general  purpose  programming  languages  or  "Hiring 
machines. 


At  this  point,  we  are  concerned  primarily  with  the  existence  of  the  algo¬ 
rithms  that  we  need.  In  Part  V,  we’ll  expand  our  inquiry  to  include  the  com¬ 
plexity  of  the  algorithms  that  we  have  found.  But  we  can  note  here  that  not 
all  of  the  algorithms  that  we  have  presented  so  far  are  efficient  in  the  com¬ 
mon  sense  of  running  in  time  that  is  polynomial  in  the  length  of  the  input. 
For  example,  ndfsmtodfsm  may  construct  a  DFSM  whose  size  grows  expo¬ 
nentially  in  the  size  of  the  input  NDFSM.Thus  its  time  requirement  (in  the 
worst  case)  is  also  exponential. 


Exercises 

1.  Define  a  decision  procedure  for  each  of  the  following  questions.  Argue  that  each 
of  your  decision  procedures  gives  the  correct  answer  and  terminates. 

a.  Given  two  DFSMs  M\  and  M2,  is  L  (M\)  =  L  (Af2)R? 

b.  Given  two  DFSMs  M\  and  A/2  is  \L  (Mt)|  <  \L  (M2)|? 

c.  Given  a  regular  grammar  G  and  a  regular  expression  u,  is  L  (G)  =  L  (a)? 

d.  Given  two  regular  expressions,  a  and  0.  do  there  exist  any  even  length 
strings  that  are  in  L  (a)  but  not  L  (/3)? 

e.  Let  2  =  {a,  b}  and  let  a  be  a  regular  expression.  Does  the  language  generat¬ 
ed  by  a  contain  all  the  even  length  strings  in  2*. 

f.  Given  an  FSM  M  and  a  regular  expression  a,  is  it  true  that  both  L(M)  and 
L  (a)  are  finite  and  M  accepts  exactly  two  more  strings  than  a  generates? 
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g.  Let  2  =  {a,  b}  and  let  a  and  j3  be  regular  expressions.  Is  the  following  sen¬ 
tence  true: 

(L(j 3)  =  a *)V(Vru  (rue  {a,b}*A|tu|  even)  — ►  weL  (a)). 

h.  Given  a  regular  grammar  G,  is  L(G)  regular? 

i.  Given  a  regular  grammar  G,  does  G  generate  any  odd  length  strings? 
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Summary  and  References 


Theoretically,  every  machine  we  build  is  a  finite  state  machine.  There  is  only  a  fi¬ 
nite  number  (probably  about  1079)  of  atoms  in  the  observable  universe  H  (that 
part  of  the  universe  that  is  within  a  distance  of  the  speed  of  light  times  the  age 
of  the  universe).  So  we  have  access  to  only  a  finite  number  of  molecules  with  which  to 
build  computer  memories,  hard  drives,  and  external  storage  devices. That  doesn't  mean 
that  every  real  problem  should  be  described  as  a  regular  language  or  solved  with  an 
FSM.  FSMs  and  regular  expressions  are  powerful  tools  for  describing  problems  that 
possess  the  kind  of  repetitive  patterns  that  FSMs  and  regular  expressions  can  capture. 
To  handle  other  problems  and  languages,  we  will  need  the  more  powerful  models  that 
we  will  introduce  in  Parts  HI  and  IV. The  abstract  machines  that  are  built  using  those 
models  will  be  equipped  with  infinite  storage  devices.  Describing  problems  using  those 
devices  may  be  useful  even  if  there  exists  some  practical  upper  hound  on  the  size  of  the 
actual  inputs  that  need  to  be  considered  (and  so  some  bound  on  the  amount  of  memo¬ 
ry  required  to  solve  the  problem). 

A  lighthearted  view  of  the  theory  of  automata  and  computability  has  inspired  a  col¬ 
lection  of  poems  9  by  Martin  Cohn  and  Harry  Mairson.  We  include  one  of  the  poems 
here.  Unfortunately,  the  names  of  the  important  concepts  aren't  standard  and  the 
poem  uses  some  that  are  different  from  ours.  So: 

•  DFA  (Deterministic  Finite  Automaton)  is  equivalent  to  DFSM. 

•  The  symbol  p  is  used  as  we  used  k  in  the  pumping  theorem. 

•  The  term  r.e.  (recursively  enumerable),  in  the  last  line,  refers  to  the  class  of  lan¬ 
guages  we  are  calling  semidecidable. 

The  Pumping  Lemma  for  DFAs  By  Marlin  Cohn 

Any  regular  language  L  has  a  magic  number  p 

And  any  long-enough  ‘word'  in  L  has  the  following  property: 

Amongst  its  first  p  symbols  is  a  segment  you  can  find 
Whose  repetition  or  omission  leaves  ‘word’  amongst  its  kind. 
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So  if  you  find  a  language  L  which  fails  this  acid  test. 

And  some  long  word  you  pump  becomes  distinct  from  all  the  rest. 
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By  contradiction  you  have  shown  that  language  L  is  not 
A  regular  L,  resilient  to  the  damage  you  have  wrought. 

But  if,  upon  the  other  hand, ‘word’  stays  within  its  L. 

Then  either  L  is  regular,  or  else  you  chose  not  well. 

For  ‘word’  is  parsed  as  xyz,  and  y  cannot  be  null. 

And  y  must  come  before  p  symbols  have  been  read  in  full. 

You  cannot  choose  the  length  of  y,  nor  can  you  specify 
Just  where  within  the  word  you  chose  it  happens  just  to  lie. 
The  DFA  locates  string  y  to  your  discomfiture. 

Recall  this  moral  to  the  grave:  You  can't  fool  Mother  Nature. 

As  postscript  mathematical,  addendum  to  the  wise: 

The  basic  proof  we  outlined  here  does  surely  generalize. 

So  there's  a  pumping  lemma  for  languages  context-free. 

But  sadly  we  do  not  have  the  same  for  those  that  are  r.e. 
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CONTEXT-FREE  LANGUAGES  AND 
PUSHDOWN  AUTOMATA 


In  this  section,  we  move  out  one  level  and  explore  the  class  of  context-free 
languages. 


This  class  is  important.  For  most  programming  languages,  the  set  of  syntactically 
legal  statements  is  (except  possibly  for  type  checking)  a  context-free  language. 
The  set  of  well-formed  Boolean  queries  is  a  context-free  language.  A  great  deal 
of  the  syntax  of  English  can  be  described  in  the  context-free  framework  that  we 
are  about  to  discuss.  To  describe  these  languages,  we  need  more  power  than  the 
regular  language  definition  allows.  For  example,  to  describe  both  programming 
language  statements  and  Boolean  queries  requires  the  ability  to  specify  that 


parentheses  be  balanced.  Yet  we 
showed  in  Section  8.4  that  it  is  not 
possible  to  define  a  regular  lan¬ 
guage  that  contains  exactly  the  set 
of  strings  of  balanced  parentheses. 

We  will  begin  our  discussion  of 
the  context-free  languages  by 
defining  a  grammatical  formal¬ 
ism  that  can  be  used  to  describe 
every  language  in  the  class 
(which,  by  the  way,  does  include 
the  language  of  balanced  paren¬ 
theses).  Then,  in  Chapter  12,  we 
will  return  to  the  question  of 
defining  machines  that  can  ac¬ 
cept  strings  in  the  language.  At 
that  point,  we'll  see  that  the 
pushdown  automaton,  an  NDFSM 
augmented  with  a  single  stack, 
can  accept 
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exactly  the  class  of  context-free  languages  that  we  are  about  to  describe.  In 
Chapter  13,  we  will  see  that  the  formalisms  that  we  have  presented  stop  short 
of  the  full  power  that  is  provided  by  a  more  general  computational  model.  So 
we'll  see  that  there  are  straightforward  languages  that  are  not  context-free. 
But  because  of  the  restrictions  that  the  context-free  formalism  imposes,  it  will 
turn  out  to  be  possible  to  define  algorithms  that  perform  at  least  the  most  basic 
operations  on  context-free  languages,  including  deciding  whether  a  string  is  in 
a  language.  We'll  summarize  those  algorithms  in  Chapters  14  and  15. 

The  theory  that  we  are  about  to  present  for  the  context-free  languages  is 
not  as  straightforward  and  elegant  as  the  one  that  we  have  just  described  for 
the  regular  languages.  We'll  see,  for  example,  that  there  doesn't  exist  an  al¬ 
gorithm  that  compares  two  pushdown  automata  to  see  if  they  are  equivalent. 
Given  an  arbitrary  context-free  grammar  G,  there  doesn't  exist  a  linear-time 
algorithm  that  decides  whether  a  string  w  is  an  element  of  Z.(G).  But  there 
does  exist  such  an  algorithm  if  we  restrict  our  attention  to  a  useful  subset  of 
the  context-free  languages.  The  context-free  languages  are  not  closed  under 
many  common  operations  like  intersection  and  complement. 

On  the  other  hand,  because  the  class  of  context-free  languages  includes 
most  programming  languages,  query  languages,  and  a  host  of  other  lan¬ 
guages  that  we  use  daily  to  communicate  with  computers,  it  is  worth  taking 
the  time  to  work  through  the  theory  that  is  presented  here,  even  though  it  is 
less  clear  than  the  one  we  were  able  to  build  in  Part  II. 


CHAPTER  11 


Context-Free  Grammars 


We  saw,  in  our  discussion  of  the  regular  languages  in  Part  II,  that  there  are 
substantial  advantages  to  using  descriptive  frameworks  (in  that  case.  FSMs, 
regular  expressions,  and  regular  grammars)  that  offer  less  power  and  flexi¬ 
bility  than  a  general  purpose  programming  language  provides.  Because  the  frame¬ 
works  were  restrictive,  we  were  able  to  describe  a  large  class  of  useful  operations  that 
could  be  performed  on  the  languages  that  we  defined. 

We  will  begin  our  discussion  of  the  context-free  languages  with  another  restricted 
formalism,  the  context-free  grammar.  But  before  we  define  it,  we  will  pause  and  an¬ 
swer  the  more  general  question,  “What  is  a  grammar?" 


1 1.1  Introduction  to  Rewrite  Systems  and  Grammars 

We’ll  begin  with  a  very  general  computational  model:  Define  a  rewrite  system  (also 
called  a  production  system  or  a  rule-based  system)  to  be  a  list  of  rules  and  an  algorithm 
for  applying  them.  Each  rule  has  a  left-hand  side  and  a  right-hand  side.  For  example, 
the  following  could  be  rewrite-system  rules: 

S  —  aSb 
aS— >e 

aSb  — *■  bSabSa 

In  the  discussion  that  follows,  we  will  focus  on  rewrite  system  that  operate  on 
strings.  But  the  core  ideas  that  we  will  present  can  be  used  to  define  rewrite  systems 
that  operate  on  richer  data  structures.  Of  course,  such  data  structures  can  be  represented 
as  strings,  but  the  power  of  many  practical  rule-based  systems  comes  from  their  ability 
to  manipulate  other  structures  directly. 
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Expert  systems,  (M.3.3)  are  programs  that  perform  tasks  in  domains  like  en¬ 
gineering,  medicine,  and  business,  that  require  expertise  when  done  by  peo¬ 
ple.  Many  kinds  of  expertise  can  naturally  be  modeled  as  sets  of 
condition/action  rules.  So  many  expert  systems  are  built  using  tools  that  sup¬ 
port  rule-based  programming. 

Rule  based  systems  are  also  used  to  model  business  practices  (M.3.4)  and 
as  the  basis  for  reasoning  about  the  behavior  of  nonplayer  characters  in  com¬ 
puter  games.  (N.3.3) 


When  a  rewrite  system  R  is  invoked  on  some  initial  string  n\  it  operates  as  follows: 
simple-rewrite[R:  rewrite  system,  w:  initial  string)  = 

1.  Set  working-siring  to  to. 

2.  Until  told  by  R  to  halt  do: 

2.1.  Match  the  left-hand  side  of  some  rule  against  some  part  of  working-string. 

2.2.  Replace  the  matched  part  of  working-string  with  the  right-hand  side  of 
the  rule  that  was  matched. 

3.  Return  working-string. 

If  simple-rewrite(R,  w)  can  return  some  string  s  then  we'll  say  that  R  can  derives 
from  w  or  that  there  exists  a  derivation  in  R  of  s  from  u\ 


Rewrite  systems  can  model  natural  growth  processes,  as  occur,  for  example, 
in  plants.  In  addition,  evolutionary  algorithms  can  be  applied  to  rule  sets. 
Thus  rewrite  systems  can  model  evolutionary  processes.  (0.2.2) 


We  can  define  a  particular  rewrite-system  formalism  by  specifying  the  form  of  the  rules 
that  are  allowed  and  the  algorithm  by  which  they  will  be  applied.  In  most  of  the  rewrite- 
system  formalisms  that  we  will  consider,  a  rule  is  simply  a  pair  of  strings.  If  the  string  on 
the  left-hand  side  matches,  it  is  replaced  by  the  string  on  the  right-hand  side.  But  more 
flexible  forms  are  also  possible.  For  example,  variables  may  be  allowed.  Let  x  be  a 
variable. Then  consider  the  rule: 

a.ca  — *  aa 

This  rule  will  squeeze  out  whatever  comes  between  a  pair  of  a  s. 

Another  useful  form  allows  regular  expressions  as  left-hand  sides.  If  we  do  that,  we 
can  write  rules  like  the  following,  which  squeezes  out  b's  between  a's: 

ab*ab*a  — •  aaa 


The  extended  form  of  regular  expressions  that  is  supported  in  programming 
languages  like  Perl  is  often  used  to  write  substitution  rules.  (Appendix  O) 
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In  addition  to  describing  the  form  of  its  rules,  a  rewrite-system  formalism  must  de¬ 
scribe  how  its  rules  will  be  applied.  In  particular,  a  rewrite-system  formalism  will  define 
the  conditions  under  which  simple-rewrite  will  halt  and  the  method  by  which  it  will 
choose  a  match  in  step  2.1.  For  example,  one  rewrite-system  formalism  might  specify 
that  any  rule  that  matches  may  be  chosen.  A  different  formalism  might  specify  that  the 
rules  have  to  be  tried  in  the  order  in  which  they  are  written,  with  the  first  one  that 
matches  being  the  one  that  is  chosen  next. 

Rewrite  systems  can  be  used  to  define  functions.  In  this  case,  we  write  rules  that  op¬ 
erate  on  an  input  string  to  produce  the  required  output  string.  Rewrite  systems  can 
also  be  used  to  define  languages.  In  this  case,  we  define  a  unique  start  symbol. The  rules 
then  apply  and  we  will  say  that  the  language  L  that  is  generated  by  the  system  is  exactly 
the  set  of  strings,  over  L's  alphabet,  that  can  be  derived  by  simple-rewrite  from  the  start 
symbol. 


A  rewrite-system  formalism  can  be  viewed  as  a  programming  language  and 
some  such  languages  turn  out  to  be  useful.  For  example,  Prolog  (M.2.3)  sup¬ 
ports  a  style  of  programming  called  logic  programming.  A  logic  program  is  a 
set  of  rules  that  correspond  to  logical  statements  of  the  form  A  if  fl.The  in¬ 
terpreter  for  a  logic  program  reasons  backwards  from  a  goal  (such  as  A), 
chaining  rules  together  until  each  right-hand  side  has  been  reduced  to  a  set 
of  facts  (axioms)  that  are  already  known  to  be  true. 


The  study  of  rewrite  systems  has  played  an  important  role  in  the  development  of  the 
theory  of  computability.  We’ll  see  in  Part  V  that  there  exist  rewrite-system  formalisms 
lhat  have  the  same  computational  power  as  the  Tliring  machine,  both  with  respect  to 
computing  functions  and  with  respect  to  defining  languages.  In  the  rest  of  our  discus¬ 
sion  in  this  chapter,  however,  we  will  focus  just  on  their  use  to  define  languages. 

A  rewrite  system  that  is  used  to  define  a  language  is  called  a  grammar.  If  G  is  a 
grammar,  let  L(G)  be  the  language  that  G  generates.  Like  every  rewrite  system,  every 
grammar  contains  a  list  (almost  always  treated  as  a  set,  i.e.,  as  an  unordered  list)  of 
rules.  Also,  like  every  rewrite  system,  every  grammar  works  with  an  alphabet,  which  we 
can  call  V.  In  the  case  of  grammars,  we  will  divide  V  into  two  subsets: 

•  a  terminal  alphabet ,  generally  called  2,  which  contains  the  symbols  that  make  up 
the  strings  in  L(G),  and 

•  a  nonterminal  alphabet,  the  elements  of  which  will  function  as  working  symbols 
that  will  be  used  while  the  grammar  is  operating.  These  symbols  will  disappear  by 
the  time  the  grammar  finishes  its  job  and  generates  a  string. 

One  final  thing  is  required  to  specify  a  grammar.  Each  grammar  has  a  unique  start 
symbol,  often  called  S, 


Grammars  can  be  used  to  describe  phenomena  as  different  as  English  (L.3), 
programming  languages  like  Java  (G.l),  music  (N.l),  dance  (Q.2.1),  the 
growth  of  living  organisms  (0.2.2),  and  the  structure  of  RNA.  (K.4) 
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A  grammar  formalism  (like  any  rewrite-system  formalism)  specifics  the  form  of  the 
rules  that  are  allowed  and  the  algorithm  by  which  they  will  he  applied.  The  grammar 
formalisms  that  we  will  consider  vary  in  the  form  of  the  rules  that  they  allow.  With  one 
exception  (Lindenmayer  systems,  which  we'll  describe  in  Section  24.4).  all  of  the  gram¬ 
mar  formalisms  that  we  will  consider  include  a  control  algorithm  that  ignores  rule 
order.  Any  rule  that  matches  may  be  applied  next. 

To  generate  strings  in  L(G),  we  invoke  simple-rewrite  (G.  S).  Simple-rewrite  will 
begin  with  S  and  will  apply  the  rules  of  G.  which  can  be  thought  of  (given  the  control 
algorithm  we  just  described)  as  licenses  to  replace  one  string  by  another.  At  each  step 
of  one  of  its  derivations,  some  rule  whose  left-hand  side  matches  somewhere  in 
working-string  is  selected.  The  substring  that  matched  is  replaced  by  the  rule's  right- 
hand  side,  generating  a  new  value  for  working  string. 


Grammars  can  be  used  to  define  languages  that,  in  turn,  define  sets  of  things 
that  don't  look  at  all  like  strings.  For  example.  SVG  (Q.1.3)  is  a  language  that 
is  used  to  describe  two-dimensional  graphics.  SVG  can  be  described  with  a 
context-free  grammar. 


We  will  use  the  symbol  =*  to  indicate  steps  in  a  derivation.  So.  for  example. suppose 
that  G  has  the  start  symbol  S  and  the  rules  S  —*  a.Sb, 5  — *  b.S'a, and  S—*e. Then  a  der¬ 
ivation  could  begin  with: 

5  =*•  aSb  =>  aaSbb  -♦ . . . 

At  each  step,  it  is  possible  that  more  than  one  rule’s  left-hand  side  matches  the 
working  string.  It  is  also  possible  that  a  rule's  left-hand  side  matches  the  working  string 
in  more  than  one  way.  In  either  case,  there  is  a  derivation  corresponding  to  each  alter¬ 
native.  It  is  precisely  the  existence  of  these  choices  that  enables  a  grammar  to  generate 
more  than  one  string. 

Continuing  with  our  example,  there  are  three  choices  al  the  next  step: 

S  =>  aSb  =>  aaSbb  =»  aaaSbbb  (using  the  first  rule), 

S  =*  aSb  =*  aaSbb  =»  aabSabb  (using  the  second  rule),  and 

S  =>  aSb  =>  aaSbb  =*  aabb  (using  the  third  rule). 

The  derivation  process  may  end  whenever  one  of  the  following  things  happens: 

1.  The  working  string  no  longer  contains  any  nonterminal  symbols  (including,  as  a 
special  case,  when  the  working  string  is  e),  or 

2.  There  are  nonterminal  symbols  in  the  working  string  hut  there  is  no  match  with 
the  left-hand  side  of  any  rule  in  the  grammar.  For  example,  if  the  working  string 
were  Aaflb,  this  would  happen  if  the  only  left-hand  side  were  C. 

In  the  first  case,  but  not  the  second,  we  say  that  the  working  string  is  generated  by 
the  grammar.  Thus,  the  language  that  a  grammar  generates  includes  only  strings  over 
the  terminal  alphabet  (i.e.,  strings  in  2*).  In  the  second  case,  we  have  a  blocked  or  non- 
terminated  derivation  but  no  generated  string. 
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It  is  also  possible  that,  in  a  particular  case,  neither  1  nor  2  is  achieved.  Suppose,  for 
example,  that  a  grammar  contained  only  the  rules  S-*B a  and  B  — *  b B,  with  S  the 
start  symbol.  Then  all  derivations  proceed  in  the  following  way: 

S  =>  Ba.  =>  bfla  =>  bbBa  ==*  bbbfla  =>  bbbbfla  =»  •  •  • 

The  working  string  is  always  rewriteable  (in  only  one  way,  as  it  happens),  and  so  this 
grammar  can  produce  no  terminated  derivations  consisting  entirely  of  terminal  sym¬ 
bols  (i.e.,  generated  strings).  Thus  this  grammar  generates  the  language  0. 

11.2  Context-Free  Grammars  and  Languages 

We’ve  already  seen  our  first  specific  grammar  formalism.  In  Chapter  7,  we  defined  a 
regular  grammar  to  be  one  in  which  every  rule  must: 

•  have  a  left-hand  side  that  is  a  single  nonterminal,  and 

•  have  a  right-hand  side  that  is  e  or  a  single  terminal  or  a  single  terminal  followed  by  * 
a  single  nonterminal. 

We  now  define  a  context-free  grammar  (or  CFG)  to  be  a  grammar  in  which  each 
rule  must: 

•  have  a  left-hand  side  that  is  a  single  nonterminal,  and 

•  have  a  right-hand  side. 

To  simplify  the  discussion  that  follows,  define  an  A  rule,  for  any  nonterminal  symbol 
A,  to  be  a  rule  whose  left-hand  side  is  A. 

Next  we  must  define  a  control  algorithm  of  the  sort  we  described  at  the  end  of  the 
last  section.  A  derivation  will  halt  whenever  no  rule’s  left-hand  side  matches  against 
working-string.  At  every  step,  any  rule  that  matches  may  be  chosen. 

Context-free  grammar  rules  may  have  any  (possibly  empty)  sequence  of  symbols  on  the 
right-hand  side.  Because  the  rule  format  is  more  flexible  than  it  is  for  regular  grammars,  the 
rules  are  more  powerful.  We  will  soon  show  some  examples  of  languages  that  can  be  gen¬ 
erated  with  context-free  grammars  but  that  can  not  be  generated  with  regular  ones. 

All  of  the  following  are  allowable  context-free  grammar  rules  (assuming  appropri¬ 
ate  alphabets): 

S-*  aSb 
S  *  e 
T-+T 
S  — *  aSbb7T 

The  following  are  not  allowable  context-free  grammar  rules: 

ST  —*  aSb 
a  — »  aSb 
e  — »  a 

The  name  for  these  grammars,  “context-free,”  makes  sense  because,  using  these 
rules,  the  decision  to  replace  a  nonterminal  by  some  other  sequence  is  made  without 
looking  at  the  context  in  which  the  nonterminal  occurs.  In  Chapters  23  and  24  we  will 
consider  less  restrictive  grammar  formalisms  in  which  the  left-hand  sides  of  the  rules 
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may  contain  several  symbols.  For  example,  the  rule  a5a  -*■  a  7a  would  be  allowed-llus 
rule  says  that  5  can  be  replaced  by  T  when  it  is  surrounded  by  a's.  One  of  those  for¬ 
malisms  is  called  "context-sensitive”  because  its  rules  allow  context  to  be  considered. 


Programming  language  syntax  is  typically  described  using  context-free 
grammars,  as  we’ll  see  below  and  in  Appendix  G. 


Formally,  a  context-free  grammar  G  is  a  quadruple  ( V.  S.  R,  5).  where: 

•  V  is  the  rule  alphabet,  which  contains  nonterminals  (symbols  that  are  used  in  the 
grammar  but  that  do  not  appear  in  strings  in  the  language)  and  terminals, 

•  1  (the  set  of  terminals)  is  a  subset  of  V, 

•  R  (the  set  of  rules)  is  a  finite  subset  of  ( V  -  2)  x  V*,  and 

•  5  (the  start  symbol)  can  be  any  element  of  V  -  2. 

Given  a  grammar  G,  define  x  =>(i  y  (abbreviated  =>  when  G  is  clear  from  context) 
to  be  the  binary  relation  derives-in-one-step,  defined  so  that: 

Vx,  >  e  V  *(x  =*c  y  iff  x  =  aA(i,y  —  ay/3,  and  there  exists  a  rule  A  — ♦  y  in  Re) 

Any  sequence  of  the  form  w (>  =>(;  W\  =>r;  ws  =>f; . . .  ir„  is  called  a  derivation 
in  G.  Let  =>0*  be  the  reflexive,  transitive  closure  of  =>(j.  We’ll  call  =>(t*  the  derives 
relation. 

The  language  generated  by  G,  denoted  L(G).  is  {ire  1*  '  S  =*(*  ir).  In  other 
words,  the  language  generated  by  G  is  the  set  of  all  strings  of  terminals  that  can  be  de¬ 
rived  from  S  using  zero  or  more  applications  of  rules  in  G. 

A  language  L  is  context-free  iff  it  is  generated  by  some  context-free  grammar  G. 
The  context-free  languages  (or  CFLs)  are  a  proper  superset  of  the  regular  languages. 
In  the  next  several  examples,  we  will  see  languages  that  are  context-free  but  not  regu¬ 
lar.  Then,  in  Chapter  13,  we  will  prove  the  other  part  of  this  claim,  namely  that  every 
regular  language  is  also  context-free. 

EXAMPLE  ii.l  The  Balanced  Parentheses  Language 

Consider  Bal  =  {«;e{),(}*:  the  parentheses  are  balanced).  We  showed  in 
Example  8.10  that  Bal  is  not  regular.  But  it  is  context-free  because  it  can  be  gen¬ 
erated  by  the  grammar  G  =  { {5, ),(},{),(},/?.  5),  where: 

R  =  {S-{S) 

S-*SS 

S—*e). 

Some  example  derivations  in  G: 

5=>(S)=»(). 

S  =*  (S)  =>  (SS)  =>  ((5)5)  =>  (()5)  =>  (0(5))  =*  (()()). 

So,  5  =>  *  ()  and  S=^*  (()())• 
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The  syntax  of  Boolean  query  languages  is  describable  with  a  context-free 
grammar.  (Q.ll) 


EXAMPLE  11.2  AnBn 

Consider  AnBn  =  {a"b"  :n  0}.  We  showed  in  Example  8.8  that  AnBn  is  not 
regular.  But  it  is  context-free  because  it  can  be  generated  by  the  grammar 
G  =  {{S,  a,b},  {a,b},  R.S},  where: 

R  =  {S—+  aSb 
S— *e}. 


What  is  it  about  context-free  grammars  that  gives  them  the  power  to  define  lan¬ 
guages  like  Bal  and  AnBn? 

We  can  begin  answering  that  question  by  defining  a  rule  in  a  grammar  G  to  be 
recursive  iff  it  is  of  the  form  X  — *  \0\Ywi,  where  Y  =>c*  uh,XwA  and  all  of  W\,  W2, 103. 
and  wA  may  be  any  element  of  V*.  A  grammar  is  recursive  iff  it  contains  at  least  one  re¬ 
cursive  rule.  For  example,  the  grammar  we  just  presented  for  Bal  is  recursive  because  it 
contains  the  rule  5  — ►  (5).  The  grammar  we  presented  for  AnBn  is  recursive  because  it 
contains  the  rule  S  — *  aSb.  A  grammar  that  contained  the  rule  5  — ►  aS  would  also  be 
recursive.  So  the  regular  grammar  whose  rules  are  {5 -*  a T,T~*  aW,  W-*  aS.W  — ►  a} 
is  recursive.  Recursive  rules  make  it  possible  for  a  finite  grammar  to  generate  an  infi¬ 
nite  set  of  strings. 

Let’s  now  look  at  an  important  property  that  gives  context-free  grammars  the 
power  to  define  languages  that  aren’t  regular.  A  rule  in  a  grammar  G  is  self-embedding 
iff  it  is  of  the  form  X  —*  w^Yuh,  where  Y  =>G*  w^XwA  and  both  W\W$  and  wAv>i  are  in 
2+.  A  grammar  is  self-embedding  iff  it  contains  at  least  one  self-embedding  rule.  So 
now  we  require  that  a  nonempty  string  be  generated  on  each  side  of  the  nested  A^.The 
grammar  we  presented  for  Bal  is  self-embedding  because  it  contains  the  rule  5  -*  (S). 
The  grammar  we  presented  for  AnBn  is  self-embedding  because  it  contains  the  rule 
S  —*  aSb.  The  presence  of  a  rule  like  S  — *  aS  does  not  by  itself  make  a  grammar  self- 
embedding.  But  the  rule  S~*  a T  is  self-embedding  in  any  grammar  G  that  also  con¬ 
tains  the  rule  T— »Sb,  since  S— *  aT  and  T=*G*  Sb.  Self-embedding  grammars  are 
able  to  define  languages  like  Bal,  AnBn,  and  others  whose  strings  must  contain  pairs  of 
matching  regions,  often  of  the  form  uv'xy'z.  No  regular  language  can  impose  such  a  re¬ 
quirement  on  its  strings. 

The  fact  that  a  grammar  G  is  self-embedding  does  not  guarantee  that  L(G)  isn’t  regular. 
There  might  be  a  different  grammar  G'  that  also  defines  L(G)  and  that  is  not  self¬ 
embedding.  For  example,  G,  =  ({S,  a},  {a}.  S-*a,  S-*aSa},  S)  is  self¬ 

embedding,  yet  it  defines  the  regular  language  a*.  However,  we  note  the  following  two 
important  facts: 

•  If  a  grammar  G  is  not  self-embedding  then  L(G)  is  regular.  Recall  that  our  defini¬ 
tion  of  regular  grammars  did  not  allow  self-embedding. 
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•  If  a  language  L  has  the  properly  that  every  grammar  that  defines  it  is  self-embedding, 
then  L  is  not  regular. 

The  rest  of  the  grammars  that  we  will  present  in  this  chapter  are  self-embedding. 


EXAMPLE  11.3  Even  Length  Palindromes 

Consider  PalEven  =  { wwR  :  we  |a,b}*|.lhe  language  of  even-length  palindromes 
of  a’s  and  b's.  We  showed  in  Example  8.1 1  that  PalEven  is  not  regular.  But  it  is  context- 
free  because  it  can  be  generated  by  the  grammar  G  -  { {.S,  a,  b}.  {a,  b).  /?.$}« 
where: 

R  —  { S  — »  a.S'a 
S  — *  b5b 
5— *e}. 


EXAMPLE  11.4  Equal  Numbers  of  a's  and  b’s 

Let  L  =  {we  {a,  b}*  :  #a(t/>)  =  #b(u’)}-  We  showed  in  Example  8.14  that  L  is 
not  regular.  But  it  is  context-free  because  it  can  be  generated  by  the  grammar 
G  =  {{5,  a,b},  {a, b},/?. SJ.where: 

R  =  {5  —  a.S’b 
5  —  b.Va 
5— *55 
5— *e}. 


These  simple  examples  are  interesting  because  they  capture,  in  a  couple  of  lines,  the 
power  of  the  context-free  grammar  formalism.  But  our  real  interest  in  context-free 
grammars  comes  from  the  fact  that  they  can  describe  useful  and  powerful  languages 
that  are  substantially  more  complex. 

It  quickly  becomes  apparent,  w'hen  we  start  to  build  larger  grammars,  that  we  need 
a  more  flexible  grammar-writing  notation.  We'll  use  the  following  two  extensions  when 
they  are  helpful: 

•  The  symbol  |  should  be  read  as  “or”.  It  allows  two  or  more  rules  to  he  collapsed 
into  one.  So  the  following  single  rule  is  equivalent  to  the  four  rules  we  wrote  in 
Example  11.4: 


5  — *  a5b|b5a|55|fi 

•  We  often  require  nonterminal  alphabets  that  contain  more  symbols  than  there  are 
letters.  To  solve  that  problem,  we  will  allow  a  nonterminal  symbol  to  be  any  se¬ 
quence  of  characters  surrounded  by  angle  brackets.  So  <  program  >  and 
<variable>  could  be  nonterminal  symbols  using  this  convention. 
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BNF  (or  Backus  Naur  form)  is  a  widely  used  grammatical  formalism  that  ex¬ 
ploits  both  of  these  extensions.  It  was  created  in  the  late  1950s  as  a  way  to  de¬ 
scribe  the  programming  language  ALGOL  60.  It  has  since  been  extended 
and  several  dialects  developed.  (G.1.1) 


EXAMPLE  11.5  BNF  for  a  Small  Java  Fragment 

Because  BNF  was  originally  designed  when  only  a  small  character  set  was  avail¬ 
able,  it  uses  the  three  symbol  sequence  :  in  place  of  — * .  The  following  BNF- 
style  grammar  describes  a  highly  simplified  and  very  small  subset  of  Java: 

<block>  {<stmt-list>}  |  {} 

<stmt-list>  ::=  <stmt>  |  <stmt-list>  <stmt> 

<stmt>  ::*»  <block>  |  while  (<cond>)  <stmt>  | 
if  (<cond>)  <stmt>  | 

do  <stmt>  while  (<cond>) ;  |  <assignment-stmt>;  | 
return  |  return  <expression>  | 

<method-i nvocati on> ; 

The  rules  of  this  grammar  make  it  clear  that  the  following  block  may  be  legal  in 
Java  (assuming  that  the  appropriate  declarations  have  occurred): 

{  while  (x  <  12)  { 

hippo. pretend (x) ; 
x  -  x  +  2; 

}} 

On  the  other  hand,  the  following  block  is  not  legal: 

{  while  x  <  12})  ( 

hippo. pretend(x) ; 
x  «  x  +  2; 

}} 


Many  other  kinds  of  practical  languages  are  also  context-free.  For  example, 
HTML  can  be  described  with  a  context-free  grammar  using  a  BNF-style 
grammar.  (Q.1.2) 


EXAMPLE  11.6  A  Fragment  of  an  English  Grammar 

Much  of  the  structure  of  an  English  sentence  can  be  described  by  a  (large)  context- 
free  grammar.  For  historical  reasons,  linguistic  grammars  typically  use  a 
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EXAMPLE  11.6  ( Continued) 

slightly  different  notational  convention.  Nonterminals  will  be  written  as  strings 
whose  first  symbol  is  an  upper  case  letter.  So  the  following  grammar  describes  a 
tiny  fragment  of  English.  The  symbol  A IP  will  derive  noun  phrases;  the  symbol  VP 
will  derive  verb  phrases: 

5  -*  NP  VP 

NP—*  the  Nominal |  a  Nominal \Nomitud\ Proper  Noun  \NP  PP 
Nominal  —*  N  \  Adjs  N 

N-*  cat  |  dogs  |  bear  |  girl  I  chocolate  |  rifle 

ProperNoun  — »  Chris  |  Fluffy 

Adjs  —*  Adj  Adjs  | Adj 

Adj — *  young  |  older  |  smart 

VP-*  V \V  NP\VP  PP 

V—*  like  |  likes  |  thinks  I  shot  |  smells 

PP  —*  Prep  NP 

Prep-*  with 


Is  English  (or  German  or  Chinese)  really  context-free?  (L.3.3) 


11.3  Designing  Context-Free  Grammars 

In  this  section,  we  offer  a  few  simple  strategies  for  designing  straightforward  context- 
free  grammars.  Later  well  see  that  some  grammars  are  better  than  others  (for  various 
reasons)  and  well  look  at  techniques  for  finding  “good"  grammars.  For  now.  we  will 
focus  on  finding  some  grammar. 

The  most  important  rule  to  remember  in  designing  a  context-free  grammar  to  gen¬ 
erate  a  language  L  is  the  following: 


•  If  L  has  the  property  that  every  string  in  it  has  two  regions  and  those  regions  must 
bear  some  relationship  to  each  other  (such  as  being  of  the  same  length),  then  the 
two  regions  must  be  generated  in  tandem.  Otherwise,  there  is  no  way  to  enforce  the 
necessary  constraint. 

Keeping  that  rule  in  mind,  there  are  two  simple  ways  to  generate  strings: 

•  To  generate  a  string  with  multiple  regions  that  must  occur  in  some  fixed  order  but 
do  not  have  to  correspond  to  each  other,  use  a  rule  of  the  form: 


A  — *  BC ... 

This  rule  generates  two  regions,  and  the  grammar  that  contains  it  will  then  rely  on 
additional  rules  to  describe  how  to  form  a  II  region  and  how  to  form  a  C  region. 
Longer  rules,  like  A  -*  BCDE,  can  be  used  if  additional  regions  are  necessary. 
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•  To  generate  a  string  with  two  regions  that  must  occur  in  some  fixed  order  and  that 
must  correspond  to  each  other,  start  at  the  outside  edges  of  the  string  and  generate 
toward  the  middle.  If  there  is  an  unrelated  region  in  between  the  related  ones,  it 
must  be  generated  after  the  related  regions  have  been  produced. 


The  outside-in  structure  of  context-free  grammars  makes  them  well  suited  to 
describing  physical  things,  like  RNA  molecules,  that  fold.  (K.4) 


EXAMPLE  11.7  Concatenating  Independent  Sublanguages 

Let  L  =  {a”b"cw  :n,m  a  0}.  Here,  the  cm  portion  of  any  string  in  L  is  completely 
independent  of  the  a"b"  portion,  so  we  should  generate  the  two  portions  separately 
and  concatenate  them  together.  So  let  G  =  ({S,  N,  C,  a,b,c},  {a,b,c},/?,  5}  where: 

R  =  {S  — 1 ►  NC  I*  Generate  the  two  independent  portions. 

N-+  aNb  /*  Generate  the  a"b"  portion,  from  the  outside  in. 
N—*e 

C—*cC  /*  Generate  the  cm  portion. 

C— *e}. 


EXAMPLE  11.8  The  Kleene  Star  of  a  Language 


Let  L  =  {af,1bn|a"Jb',J . . .  a"'b"»  :lt>0  and  Vi  (n,  a  0)}.  For  example,  the  follow¬ 
ing  strings  are  in  L :  e,  abab,  aabbaaabbbabab.  Note  that  L  =  {anbn :  n  a  0}*, 
which  gives  a  clue  how  to  write  the  grammar  we  need.  We  know  how  to  produce 
individual  elements  of  {a"b"  :  n  a  0},  and  we  know  how  to  concatenate  regions 
together.  So  a  solution  is  G  =  ({S,  M,  a,  b},  {a,  b},  /?,  5}  where: 


R  =  {S-+MS 


S  —*  e 
M  -*■  aMb 
M  ~*e}. 


/*  Each  M  will  generate  one  (a"b"  :  n  a  0} 
region. 

I*  Generate  one  region. 


11.4  Simplifying  Context-Free  Grammars# 

In  this  section,  we  present  two  algorithms  that  may  be  useful  for  simplifying  context- 
free  grammars. 

Consider  Ihe  grammar  G  =  ({5.  A.  B,  C,  D,  a,  b},  {a,  b},  R, S ),  where: 

R  =  {S  —  AB\AC 
A  — *  aAbls 
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B-»bA 
C  —  bCa 
D-*AB). 

G  contains  two  useless  variables:  C  Is  useless  because  it  is  not  able  to  generate  any 
strings  in  Z4.  (Every  time  a  rule  is  applied  to  a  C.  a  new  C  is  added.)  D  is  useless  be¬ 
cause  it  is  unreachable,  via  any  derivation,  from  S.  So  any  rules  that  mention  either  C 
or  D  can  be  removed  from  G  without  changing  the  language  that  is  generated.  We 
present  two  algorithms,  one  to  find  and  remove  variables  like  C  that  are  unproductive, 
and  one  to  find  and  remove  variables  like  D  that  are  unreachable. 

Given  a  grammar  G  —  (V,  Z,  R,  S ),  we  define  removampmductive(G)  to  create  a 
new  grammar  G',  where  L  ( G' )  =  L(G)  and  G‘  does  not  contain  any  unproductive  sym¬ 
bols.  Rather  than  trying  to  find  the  unproductive  symbols  directly. removeunproductive  will 
find  and  mark  all  the  productive  ones.  Any  that  are  left  unmarked  at  the  end  are  unproduc¬ 
tive.  Initially,  all  terminal  symbols  will  be  marked  as  productive  since  each  of  them  gener¬ 
ates  a  terminal  string  (itself).  A  nonterminal  symbol  will  be  marked  as  productive  when  it 
is  discovered  that  there  is  at  least  one  way  to  rewrite  it  as  a  sequence  of  productive  symbols. 
So  removeunproductive  effectively  moves  backwards  from  terminals,  marking  nontermi¬ 
nals  along  the  way. 

removeunproductive{G:  CFG)  = 

1.  G'  =  G. 

2.  Mark  every  nonterminal  symbol  in  G'  as  unproductive. 

3.  Mark  every  terminal  symbol  in  G'  as  productive. 

4.  Until  one  entire  pass  has  been  made  without  any  new  symbol  being 
marked  do: 

For  each  rule  X  —*  a  in  R  do: 

If  every  symbol  in  a  has  been  marked  as  productive  and  X  has  not  yet 
been  marked  as  productive,  then  mark  X  as  productive. 

5.  Remove  from  Vfc'  every  unproductive  symbol. 


6.  Remove  from  Rr,‘  every  rule  with  an  unproductive  symbol  on  cither  the  left- 
hand  side  or  the  right-hand  side. 

7.  Return  G' 


Removeunproductive  must  halt  because  there  is  only  some  finite  number  of  nonter¬ 
minals  that  can  be  marked  as  productive.  So  the  maximum  number  of  limes  it  can  exe¬ 
cute  step  4  is  \  V  -  Z|.  Clearly  L  (G')CL  (G)  since  G’  can  produce  no  derivations 
that  G  could  not  have  produced.  And  L  (O')  =  L  (G)  because  the  only  derivations 
that  G  can  perform  but  G'  cannot  are  those  that  do  not  end  with  a  terminal  string. 

Notice  that  it  is  possible  that  S  is  unproductive.  Ibis  will  happen  precisely  in  case 
L(G)  =  0.  We  will  use  this  fact  in  Section  14.1.2  to  show  the  existence  of  a  procedure 
that  decides  whether  or  not  a  context-free  language  is  empty. 

Next  well  define  an  algorithm  for  getting  rid  of  unreachable  symbols  like  D  in  the 
grammar  we  presented  above.  Given  a  grammar  G  *  ( V,  Z.  R.  S).  we  define 
removeunreachable(G)  to  create  a  new  grammar  G\  where  L  (G*)  =  L(G)  and  G' 
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does  not  contain  any  unreachable  nonterminal  symbols.  What  removeunreachable  does 
is  to  move  forward  from  S,  marking  reachable  symbols  along  the  way. 

removeunreachable(G :  CFG)  = 

1.  G'  =  G. 

2.  Mark  S  as  reachable. 

3.  Mark  every  other  nonterminal  symbol  as  unreachable. 

4.  Until  one  entire  pass  has  been  made  without  any  new  symbol  being  marked  do: 

For  each  rule  X—*aAp  (where  A  e  V  -  2  and  a,  P  e  V *)  in  R  do: 

If  X  has  been  marked  as  reachable  and  A  has  not,  then  mark  A  as 
reachable. 

5.  Remove  from  Vc •  every  unreachable  symbol. 

6.  Remove  from  Re  every  rule  with  an  unreachable  symbol  on  the  left-hand  side. 

7.  Return  G\ 

Removeunreachable  must  halt  because  there  is  only  some  finite  number  of  nonter¬ 
minals  that  can  be  marked  as  reachable.  So  the  maximum  number  of  times  it  can  exe¬ 
cute  step  4  is  IF  —  X|.  Clearly  L  ( G')QL  (G)  since  G'  can  produce  no  derivations 
that  G  could  not  have  produced.  And  L  (G')  =  L  (G)  because  every  derivation  that 
can  be  produced  by  G  can  also  be  produced  by  G\ 

11.5  Proving  That  a  Grammar  is  Correct  • 

In  the  last  couple  of  sections,  we  described  some  techniques  that  are  useful  in  designing 
context-free  languages  and  we  argued  that  the  grammars  that  we  built  were  correct 
(i.e.,  that  they  correctly  describe  languages  with  certain  properties).  But,  given  some 
language  L  and  a  grammar  G,  can  we  actually  prove  that  G  is  correct  (i.e.,  that  it  gen¬ 
erates  exactly  the  strings  in  L)?  To  do  so,  we  need  to  prove  two  things: 

1.  G  generates  only  strings  in  L,  and 

2.  G  generates  all  the  strings  in  L. 

The  most  straightforward  way  to  do  step  1  is  to  imagine  the  process  by  which  G  gen¬ 
erates  a  string  as  the  following  loop  (a  version  of  simple-rewrite ,  using  st  in  place  of 
working-string): 

1.  st  =  S. 

2.  Until  no  nonterminals  are  left  in  st  do: 

Apply  some  rule  in  R  to  st. 

3.  Output  st. 

Then  we  construct  a  loop  invariant  I  and  show  that: 

•  /is  true  when  the  loop  begins, 

•  /is  maintained  at  each  step  through  the  loop  (i.e.,  by  each  rule  application),  and 

•  /  A  (sr  contains  only  terminal  symbols)  —*  steL. 

Step  2  is  generally  done  by  induction  on  the  length  of  the  generated  strings. 
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EXAMPLE  11.9  The  Correctness  of  the  AnBn  Grammar 

In  Example  11.2,  we  considered  Ihe  language  AnBn.  We  huili  for  it  the  grammar 
G=  {{S,  a.b},  {a.  b},  R.  5}.  where: 

R  =  { S  — » aSb  (1) 

S~*e}.  (2) 

We  now  show  that  G  is  correct.  We  first  show  that  every  string  ir  in  L(G)  is  in 
AnBn:  Lets/  be  the  working  string  at  any  point  in  a  derivation  in  G.  We  need  to  de¬ 
fine  /  so  that  it  captures  the  two  features  of  every  string  in  AnBn:  The  number  of 
a’s  equals  the  number  of  b’s  and  the  letters  are  in  the  correct  order.  So  we  let  l  be: 

(#a<vr)  =  #b(.v/))  A(sr  e  a*($  U  e)b* ). 

Now  we  prove: 

•  /is  true  when  st  =  S :  In  this  case.  #a(.v/)  =  #b(.\/))  =  0  and  si  is  of  the  correct 
form. 

•  If  /  is  true  before  a  rule  fires,  then  it  is  true  after  the  rule  fires:  To  prove  this, 
we  consider  the  rules  one  at  a  time  and  show  that  each  of  them  preserves  I. 
Rule  (1)  adds  one  a  and  one  b  to  st.  so  it  does  not  change  the  difference  be¬ 
tween  the  number  of  a’s  and  the  number  of  b’s.  Further,  it  adds  the  a  to  the  left 
of  S  and  the  b  to  the  right  of  S.  so  if  the  form  constraint  was  satisfied  before  ap¬ 
plying  the  rule  it  still  is  afterwards.  Rule  (2)  adds  nothing  so  it  does  not  change 
either  the  number  of  a’s  or  b's  or  their  locations. 

•  If  /  is  true  and  st  contains  only  terminal  symbols,  then  st  e  A"B":  In  this  case, st 
possesses  the  three  properties  required  of  all  strings  in  A"B":  They  are  com¬ 
posed  only  of  a’s  and  b's,  (#a(.vf)  =  #b(.v/)),  and  all  a’s  come  before  all  b’s. 

Next  we  show  that  every  string  w  in  AnBn  can  be  generated  by  G:  Every 
string  in  AnBn  is  of  even  length,  so  we  will  prove  the  claim  only  for  strings  of  even 
length.  The  proof  is  by  induction  on  |w’|: 

•  Base  case:  If  |w|  =  0,  then  w  =  e,  which  can  be  generated  bv  applying  rule 
(2)  to  S. 

•  Prove:  If  every  string  in  AnB"  of  length  k.  where  k  is  even,  can  be  generated  by 
G,  then  every  string  in  AnBn  of  length  k  +  2  can  also  be  generated.  Notice 
that,  for  any  even  k.  there  is  exactly  one  string  in  A"Bn  of  length  k  :  aA/2b*/z. 
There  is  also  only  one  string  of  length  k  +  2.  namely  aaA  :bA  :b.  that  can  be 
generated  by  first  applying  rule  (1)  to  produce  a.Vb,  and  then  applying  to  S 
whatever  rule  sequence  generated  a*  2b4 1.  By  the  induction  hypothesis,  such  a 
sequence  must  exist. 
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EXAMPLE  11.10  The  Correctness  of  the  Equal  a's  and  b's  Grammar 

In  Example  11.4  we  considered  the  language  L  =  {we  {a,  b}*  :#*(«;)  =  #b(w)}. 

We  built  for  it  the  grammar  G  =  {  {S.  a,  b},  {a.  b},  R ,  S},  where: 

R  =  {5  — *  aSb  (1) 

S-*bSa  (2) 

S  —  SS  (3) 

S^s).  (4) 

This  time  it  is  perhaps  less  obvious  that  G  is  correct.  In  particular,  does  it  gen¬ 
erate  every  sequence  where  the  number  of  a’s  equals  the  number  of  b’s?  The  an¬ 
swer  is  yes,  which  we  now  prove. 

To  make  it  easy  to  describe  this  proof,  we  define  the  following  function: 

A(w)  =  #a(w)  -  #b(w). 

Note  that  a  string  w  is  in  L  iff  we  {a,b}*  and  A (w)  =  0. 

We  begin  by  showing  that  every  string  w  in  L(G)  is  in  L :  Again,  let  st  be  the 
working  string  at  any  point  in  a  derivation  in  G.  Let  /  be: 

s/e{a,  b,S}*  A  A(.vf)  =  0. 

Now  we  prove: 

•  /is  true  when  st  =  S:  In  this  case,  #a(sf)  =  M4’0)  =  0.  So  A  (st)  =  0. 

•  If  /  is  true  before  a  rule  fires,  then  it  is  true  after  the  rule  fires:  The  only  sym¬ 
bols  that  can  be  added  by  any  rule  are  a,  b.  and  S.  Rules  (1)  and  (2)  each  add 
one  a  and  one  b  to  st,  so  neither  of  them  changes  A  (st).  Rules  (3)  and  (4)  add 
neither  a’s  nor  b’s  to  the  working  string,  so  A(5/)  does  not  change. 

•  If  /  is  true  and  st  contains  only  terminal  symbols,  then  steL:  In  this  case,  st 
possesses  the  two  properties  required  of  all  strings  in  L:  They  are  composed 
only  of  a’s  and  b’s  and  A  (st)  =  0. 

It  is  perhaps  less  obviously  true  that  G  generates  every  string  in  L.  Can  we  be  sure 
that  there  are  no  permutations  that  it  misses?  Yes,  we  can.  We  next  we  show  that 
every  string  w  in  L  can  be  generated  by  G.  Every  string  in  L  is  of  even  length,  so  we 
will  prove  the  claim  only  for  strings  of  even  length. The  proof  is  by  induction  on  |w|. 

•  Base  case:  If  |  w\  =  0,  w  =  e,  which  can  be  generated  by  applying  rule  (4)  to  S. 

•  Prove  that  if  every  string  in  L  of  length  s  k,  where  k  is  even,  can  be  generated 
by  G,  then  every  string  w  in  L  of  length  k  +  2  can  also  be  generated:  Since  w 
has  length  A  +  2,  it  can  be  rewritten  as  one  of  the  following:  axb,  bxa,  ara,  or 
b.vb,  for  some  x  e  {a,  b}*.  |x|  =  k.  We  consider  two  cases: 

•  it)  =  atb  or  bxa.  If  weL,  then  A(u>)  =  0  and  so  A(jc)  must  also  be  0. 
|jc|  =  k.  So,  by  the  induction  hypothesis,  G  generates  x.  Thus  G  can  also 
generate  w :  It  first  applies  either  rule  (1)  (if  w  =  avb)  or  rule  (2)  (if  w  = 
b.va).  It  then  applies  to  S  whatever  rule  sequence  generated  x.  By  the  induc¬ 
tion  hypothesis,  such  a  sequence  must  exist. 


218  Chapter  11  Context-Free  Grammars 


EXAMPLE  11.10  ( Continued) 

•  w  =  aura,  or  b.vb.  We  consider  the  former  case. The  argument  is  parallel  for 
the  latter.  Note  that  any  string  in  L,  of  either  of  these  forms,  must  have 
length  at  least  4.  We  will  show  that  w  =  vy ,  where  both  v  and  y  are  in  L, 

2  s  |y|  <  k,  and  2  s  |y|  s  k.  If  that  is  so,  then  G  can  generate  w  by  first 
applying  rule  (3)  to  produce  55,  and  then  generating  v  from  the  first  5  and 
y  from  the  second  5.  By  the  induction  hypothesis,  it  must  be  possible  for  it 
to  do  that  since  both  v  and  y  have  length  s  k. 

To  find  v  and  y,  we  can  imagine  building  it’  (which  we've  rewritten  as  axa) 
up  by  concatenating  one  character  at  a  time  on  the  right.  After  adding  only 
one  character,  we  have  just  a.  A(a)  =  1.  Since  w  e  L.  A  (in)  =  0.  So  A 
(at)  =  — 1  (since  it  is  missing  the  final  a  of  in).  The  value  of  A  changes  by  ex¬ 
actly  1  each  time  a  symbol  is  added  to  a  string.  Since  A  is  positive  when  only  a 
single  character  has  been  added  and  becomes  negative  by  the  time  the  string 
ax  has  been  built,  it  must  at  some  point  before  then  have  been  0.  Let  v  be  the 
shortest  nonempty  prefix  of  w  to  have  a  value  of  0  for  A.  Since  v  is  nonempty 
and  only  even  length  strings  can  have  A  equal  to  l),2s  |  v\.  Since  A  became  0 
sometime  before  w  became  ar,  v  must  be  at  least  two  characters  shorter  than 
w  (it  must  be  missing  at  least  the  last  character  of  x  plus  the  final  a),  so 
|v|  k.  Since  A(u)  =  O.ueL  Since  w  =  vy,  we  know  bounds  on  the 
length  of  y:2  s  |y|  ^  k.  Since  A(u>)  =  (land  A(u)  =  0.  A(y)  must  also  beO 
and  soy  eL. 


1 1 .6  Derivations  and  Parse  Trees 

Context-free  grammars  do  more  than  just  describe  the  set  of  strings  in  a  language. 
They  provide  a  way  of  assigning  an  internal  structure  to  the  strings  that  they  derive. 
This  structure  is  important  because  it.  in  turn,  provides  the  starting  point  for  assigning 
meanings  to  the  strings  that  the  grammar  can  produce. 

The  grammatical  structure  of  a  string  is  captured  by  a  parse  tree,  which  records 
which  rules  were  applied  to  which  nonterminals  during  the  string's  derivation.  In 
Chapter  15.  we  will  explore  the  design  of  programs,  called  parsers,  that,  given  a  gram¬ 
mar  G  and  a  string  w,  decide  whether  weL  (G)  and,  if  it  is.  create  a  parse  tree  that 
captures  the  process  by  which  G  could  have  derived  w. 

A  parse  tree,  derived  by  a  grammar  G  -  (V.  1.  R.  5),  is  a  rooted,  ordered  tree  in  which: 

•  Every  leaf  node  is  labeled  with  an  element  of  £  U  j  r  } , 

•  The  root  node  is  labeled  5, 

•  Every  other  node  is  labeled  with  some  element  of  V  -  and 

•  If  m  is  a  nonleaf  node  labeled  X  and  the  children  of  //»  arc  labeled  .Vj, 

then  R  contains  the  rule  X~*x\,xi . x„. 
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Define  the  branching  factor  of  a  grammar  G  to  be  length  (the  number  of  symbols) 
of  the  longest  right-hand  side  of  any  rule  in  G.Then  the  branching  factor  of  any  parse 
tree  generated  by  G  is  less  than  or  equal  to  the  branching  factor  of  G. 


EXAMPLE  11.11  The  Parse  Tree  of  a  Simple  English  Sentence 

Consider  again  the  fragment  of  an  English  grammar  that  we  wrote  in  Example  11.6. 
That  grammar  can  be  used  to  produce  the  following  parse  tree  for  the  sentence 
the  smart  cat  smells  chocolate: 


Notice  that,  in  Example  1 1.11,  the  constituents  (the  subtrees)  correspond  to  objects 
(like  some  particular  cat)  that  have  meaning  in  the  world  that  is  being  described.  It  is 
clear  from  the  tree  that  this  sentence  is  not  about  cat  smells  or  smart  cat  smells. 

Because  parse  trees  matter.it  makes  sense,  given  a  grammar  G,to  distinguish  between: 

•  G’s  weak  generative  capacity,  defined  to  be  the  set  of  strings,  L(G),  that  G  gen¬ 
erates,  and 

•  G’s  strong  generative  capacity ,  defined  to  be  the  set  of  parse  trees  that  G  generates. 

When  we  design  grammars  it  will  be  important  that  we  consider  both  their  weak  and 
their  strong  generative  capacities. 

In  our  last  example,  the  process  of  deriving  the  sentence  the  smart  cat  smel  Is 
chocol  ate  began  with: 

S=>NPVP=>  ... 

Looking  at  the  parse  tree,  it  isn’t  possible  to  tell  which  of  the  following  happened  next: 

S  =>  NP  VP  =»  The  Nominal  VP  => 

S=>NPVP=*  NPV  NP=> 

Parse  trees  are  useful  precisely  because  they  capture  the  important  structural  facts 
about  a  derivation  but  throw  away  the  details  of  the  order  in  which  the  nonterminals 
were  expanded. 

While  it’s  true  that  the  order  in  which  nonterminals  are  expanded  has  no  bearing 
on  the  structure  that  we  wish  to  assign  to  a  string,  order  will  become  important  when 
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we  attempt  to  define  algorithms  that  work  with  context-free  grammars.  For  example, 
in  Chapter  15  we  will  consider  various  parsing  algorithms  for  context-free  languages. 
Given  an  input  string  w,  such  algorithms  must  work  systematically  through  the  space 
of  possible  derivations  in  search  of  one  that  could  have  generated  w.  To  make  it  eas¬ 
ier  to  describe  such  algorithms,  we  will  define  two  useful  families  of  derivations: 

•  A  left-most  derivation  is  one  in  which,  at  each  step,  the  leftmost  nonterminal  in  the 
working  string  is  chosen  for  expansion. 

•  A  right-most  derivation  is  one  in  which,  at  each  step,  the  rightmost  nonterminal  in 
the  working  string  is  chosen  for  expansion. 

Returning  to  the  smart  cat  example  above: 

•  A  left-most  derivation  is: 

S  =>  NP  VP  =>  The  Nominal  VP  =>  The  Adjs  N  VP  =*  The  Adj  N  VP  =* 

The  smart  N  VP=>  the  smart  cat  VP=*  the  smart  cat  V  NP=* 
the  smart  cat  smells  NP=*  the  smart  cat  smells  Nominal  =» 
the  smart  cat  smells  N=*  the  smart  cat  smells  chocolate 

•  A  right-most  derivation  is: 

S=>NPVP=*NPV  NP=>NPV  Nominal  =*NPV  N=*NP  V  chocol  ate  => 
NP  smel  1  s  chocol  ate  =*■  the  Nominal  smel  1  s  chocol  ate  => 

the  Adjs  N smells  chocolate  =>  The  Adjs  cat  smells  chocolate  =* 
the  Adj  cat  smel  1  s  chocol  ate  =»  the  smart  cat  smel  1  s  chocol  ate 


11.7  Ambiguity 

Sometimes  a  grammar  may  produce  more  than  one  parse  tree  for  some  (or  all)  of  the 
strings  it  generates.  When  this  happens,  we  say  that  the  grammar  is  ambiguous.  More 
precisely,  a  grammar  G  is  ambiguous  iff  there  is  at  least  one  siring  in  L(G)  for  which  G 
produces  more  than  one  parse  tree.  It  is  easy  to  write  ambiguous  grammars  if  we  are 
not  careful.  In  fact,  we  already  have. 


EXAMPLE  11.12  The  Balanced  Parentheses  Grammar  is  Ambiguous 

Recall  the  language  Bal  =  {tee{),  (}*:  the  parentheses  are  balanced},  for 
which  we  wrote  the  grammar  G  =  {{S, ),(}.{),(}./?.  5),  where: 

/?=  {5-(5) 

S-*SS 

S—*e}. 

G  can  produce  both  of  the  following  parse  trees  for  the  string  (())(): 
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In  fact,  G  can  produce  an  infinite  number  of  parse  trees  for  the  string  (())(). 


A  grammar  G  is  unambiguous  iff,  for  all  strings  to,  at  every  point  in  a  leftmost  or 
rightmost  derivation  of  to,  only  one  rule  in  G  can  be  applied.  The  grammar  that  we  just 
presented  in  Example  11.12  clearly  fails  to  meet  this  requirement.  For  example,  here 
are  two  leftmost  derivations  of  the  string  (())(): 

.  s  =*  ss  =>  (s  )s  =>  as  ))s  =*  (())s  =>  ms )  =>  (())()• 

.  S  =>  55  =>  SSS  =*  SS  =*  (S  )S  =>  ((S))S  =>  (Q)S  =*  (Q)(S)  =>  (())(). 


1 1 .7.1  Why  Is  Ambiguity  a  Problem? 

Why  are  we  suddenly  concerned  with  ambiguity?  Regular  grammars  can  also  be  am¬ 
biguous.  And  regular  expressions  can  often  derive  a  single  string  in  several  distinct  ways. 


EXAMPLE  11.13  Regular  Expressions  and  Grammars  Can  Be  Ambiguous 


Let  L  -  {toe{a,b}*:to  contains  at  least  one  a}.  L  is  regular.  It  can  be  defined 
with  both  a  regular  expression  and  a  regular  grammar.  We  show  two  ways  in 
which  the  string  aaa  can  be  generated  from  the  regular  expression  we  have  writ¬ 
ten  and  two  ways  in  which  it  can  be  generated  by  the  regular  grammar: 


Regular  Expression 

(a  U  b)*a(aU  bp. 

choose  a  from  (aU  b),  then 
choose  a  from  (aU  b),  then 
choose  a,  then 
choose  e  from  (a  U  b)*. 

or 


Regular  Grammar 

5  —  a 
S  -*  b5 
S  —  aS 
S  —  a7 
T-+  a 
T—  b 
T  —  aT 
T—  bT 


choose  e  from  (aU  b)*,  then 
choose  a,  then 
choose  a  from  (aU  b),  then 
choose  a  from  (aU  b). 


S 

Z' 

a  S 
I 

a 


S 

S  N 

a  T 


T 

I 
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We  had  no  reason  to  be  concerned  with  ambiguity  when  we  were  discussing  reg¬ 
ular  languages  because,  for  most  applications  of  them,  we  don't  care  about  assigning 
internal  structure  to  strings.  With  context-free  languages,  we  usually  do  care  about 
internal  structure  because,  given  a  string  u\  we  want  to  assign  meaning  to  w.  We  al¬ 
most  always  want  to  assign  a  unique  such  meaning.  It  is  generally  difficult,  if  not  im¬ 
possible.  to  assign  a  unique  meaning  without  a  unique  parse  tree.  So  an  ambiguous 
grammar,  which  fails  to  produce  a  unique  parse  tree,  is  a  problem,  as  we'll  see  in  our 
next  example. 


EXAMPLE  11.14  An  Ambiguous  Expression  Grammar 

Consider  Expr,  which  we’ll  define  to  be  the  language  of  simple  arithmetic  expressions 
of  the  kind  that  could  be  part  of  anything  from  a  small  calculator  to  a  programming 
language.  We  can  define  £xpr  with  the  following  context-free  grammar  G  =  { { E,  i  d, 
+,  *,  (. )},  {id,  +,  *,  (, )},  R,  £},  where: 


R  =  {£—*£+£ 


£—►£*£ 
£-*(£) 
£— »  id  }. 


So  that  we  can  focus  on  the  issues  we  care  about,  we've  used  the  terminal  sym¬ 
bol  id  as  a  shorthand  for  any  of  the  numbers  or  variables  that  can  actually  occur 
as  the  operands  in  the  expressions  that  G  generates.  Most  compilers  and  inter¬ 
preters  for  expression  languages  handle  the  parsing  of  individual  operands  in  a 
first  pass,  called  lexical  analysis,  which  can  be  done  with  an  FSM.  We’ll  return  to 
this  topic  in  Chapter  15. 

Consider  the  string  2  +  3*5,  which  we  will  write  asid  +  id*id.  Using  G,  we 
can  get  two  parses  for  this  string: 


Should  an  evaluation  of  this  expression  return  17  or  25?  (See  Example  11.19 
for  a  different  expression  grammar  that  fixes  this  problem.) 
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Natural  languages,  like  English  and  Chinese,  are  not  explicitly  designed.  So  it 
isn’t  possible  to  go  in  and  remove  ambiguity  from  them.  See  Example  11.22 
and  L.3.4. 


Designers  of  practical  languages  must  be  careful  that  they  create  languages  for 
which  they  can  write  unambiguous  grammars. 

11.7.2  Inherent  Ambiguity 

In  many  cases,  when  confronted  with  an  ambiguous  grammar  G,  it  is  possible  to  con¬ 
struct  a  new  grammar  G '  that  generates  L(G)  and  that  has  less  (or  no)  ambiguity.  Un¬ 
fortunately,  it  is  not  always  possible  to  do  this.  There  exist  context-free  languages  for 
which  no  unambiguous  grammar  exists.  We  call  such  languages  inherently  ambiguous. 


EXAMPLE  11.15  An  Inherently  Ambiguous  Language 

Let  L  =  { a'b'c*  :  i,  j,  k  a  0,  i  =  j  or  j  =  k}.  An  alternative  way  to  describe  it  is 
{a'^'c'" :  n,m  20}U  {a'^c'":  n,  m  >  0}.  Every  string  in  L  has  either  (or 
both)  the  same  number  of  a’s  and  b’s  or  the  same  number  of  b’s  and  c’s.  L  is  in¬ 
herently  ambiguous.  One  grammar  that  describes  it  is  G  =  ({5, S2,  A,  B,  a,  b, 
c},  {a,b,c },R,  S},  where: 

R  =  {5-5,152 

51  — *  Sic  |  A  I*  Generate  all  strings  in  {a"b"cn?  :n,m  s:  0). 

A  — *  aAb  |  e 

52  — *  a S2 1  B  I*  Generate  all  strings  in  {a"bmc,n  :n,m  s  0}. 

B  — *  b£c  |  e}. 

Now  consider  the  strings  in  AnBnCn  =  {a"b"c" :  n  >  0}.  They  have  two  dis¬ 
tinct  derivations,  one  through  S,  and  the  other  through  S2.  It  is  possible  to  prove 
that  L  is  inherently  ambiguous:  Given  any  grammar  G  that  generates  L  there  is  at 
least  one  string  with  two  derivations  in  G. 


EXAMPLE  11.16  Another  Inherently  Ambiguous  Language 

Let  L  =  (a'b'a^b1 :  i, k,  l  s  0,i  =  k  or  j  =  /).  L  is  also  inherently  ambiguous. 


Unfortunately,  there  are  no  clean  fixes  for  the  ambiguity  problem  for  context-free 
languages.  In  Section  22.5  we’ll  see  that  both  of  the  following  problems  are  undecidable: 

•  Given  a  context-free  grammar  G,  is  G  ambiguous? 

•  Given  a  context-free  language  L,  is  L  inherently  ambiguous? 
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1 1 .7.3  Techniques  for  Reducing  Ambiguity  * 

Despite  the  negative  theoretical  results  that  we  have  just  mentioned,  it  is  usually  very 
important,  when  we  are  designing  practical  languages  and  their  grammars,  that  we 
come  up  with  a  language  that  is  not  inherently  ambiguous  and  a  grammar  for  it  that  is 
unambiguous.  Although  there  exists  no  general  purpose  algorithm  to  test  for  ambigui¬ 
ty  in  a  grammar  or  to  remove  it  when  it  is  found  (since  removal  is  not  always  possible), 
there  do  exist  heuristics  that  we  can  use  to  find  some  of  the  more  common  sources  of 
ambiguity  and  remove  them.  We’ll  consider  here  three  grammar  structures  that  often 
lead  to  ambiguity: 

1.  e  rules  like  5  — *  e. 

2.  Rules  like  S-*SS  or  E— *E  +  E.  In  other  words  recursive  rules  whose  right- 
hand  sides  are  symmetric  and  contain  at  least  two  copies  of  the  nonterminal  on 
the  left-hand  side. 

3.  Rule  sets  that  lead  to  ambiguous  attachment  of  optional  postfixes. 

Eliminating  e-Rules 

In  Example  11.12,  we  showed  a  grammar  for  the  balanced  parentheses  language.  That 
grammar  is  highly  ambiguous.  Its  major  problem  is  that  it  is  possible  to  apply  the  rule 
S-*SS  arbitrarily  often,  generating  unnecessary  instances  of  S.  which  can  then  be 
wiped  out  without  a  trace  using  the  rule  S  — *  t\  If  we  could  eliminate  the  rule  S  — >  e,  we 
could  eliminate  that  source  of  ambiguity.  We'll  call  any  rule  whose  right-hand  side  is  e 
an  e-rule. 

We’d  like  to  define  an  algorithm  that  could  remove  e-rules  from  a  grammar  G  with¬ 
out  changing  the  language  that  G  generates.  Clearly  if  ee  L  (G),  that  won’t  be  possi¬ 
ble.  Only  an  e-rule  can  generate  e.  However,  it  is  possible  to  define  an  algorithm  that 
eliminates  e-rules  from  G  and  leaves  L{G)  unchanged  except  that,  if  ee  L  (G),  it  will 
be  absent  from  the  language  generated  by  the  new  grammar.  We  will  show  such  an  al¬ 
gorithm.  Then  we’ll  show  a  simple  way  to  add  e  back  in.  when  necessary,  without 
adding  back  the  kind  of  e-rules  that  cause  ambiguity. 

Let  G  =  (V,  S.R.S)  be  any  context-free  grammar.  The  following  algorithm  con¬ 
structs  a  new  grammar  G'  such  that  L  (G‘)  =  L  (G )  -  {e }  and  G'  contains  no  e-rules: 

removeEps  (G:  CFG)  = 

1.  Let  G'  —  G. 

2.  Find  the  set  N  of  nullable  variables  in  G*.  A  variable  X  is  nullable  iff  either. 

( 1 )  there  is  a  rule  X— *  e.  or 

(2)  there  is  a  rule  X  — *  PQR . . .  such  that  P.Q.R _ are  all  nullable. 

So  compute  N  as  follows: 

2.1.  Set  N  to  the  set  of  variables  that  satisfy  ( 1 ). 

2.2.  Until  an  entire  pass  is  made  without  adding  anything  to  N  do: 

Evaluate  all  other  variables  with  respect  to  (2).  If  any  vari¬ 
able  satisfies  (2)  and  is  not  in  /V,  insert  it. 
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3.  Define  a  rule  to  be  modifiable  iff  it  is  of  the  form  P  —*  aQp  for  some  Q  in  .V 
and  any  a,  p  in  V  *.  Since  Q  is  nullable,  it  could  be  wiped  out  by  the  applica¬ 
tion  of  e-rules.  But  those  rules  are  about  to  be  deleted.  So  one  possibility 
should  be  that  Q  just  doesn't  get  generated  in  the  first  place.  To  make  that  hap¬ 
pen  requires  adding  new  rules.  So,  repeat  until  G '  contains  no  modifiable  rules 
that  haven't  been  processed: 

3.1.  Given  the  rule  P  — *  aQP.  where  QeN,  add  the  rule  P  -*■  ap  if  it  is  not  al¬ 
ready  present  and  if  a/3  *  e  and  if  P  *  ap.  This  last  check  prevents  adding 
the  useless  rule  P-*  P,  which  would  otherwise  be  generated  if  the  original 
grammar  contained,  for  example,  the  rule  P  — *  PQ  and  Q  were  nullable. 

4.  Delete  from  G'  all  rules  of  the  form  X—*b. 

5.  Return  G\ 

If  removeEps  halts.  L(G')  =  L(G)  -  {e}  and  G'  contains  no  e-rules.  And 
removeEps  must  halt.  Since  step  2  must  add  a  nonterminal  to  N  at  each  pass  and  it  can¬ 
not  add  any  symbol  more  than  once,  it  must  halt  within  |K  -  2|  passes.  Step  3  may 
have  to  be  done  once  for  every  rule  in  G  and  once  for  every  new  rule  that  it  adds.  But 
note  that,  whenever  it  adds  a  new  rule,  that  rule  has  a  shorter  right-hand  side  than  the 
rule  from  which  it  came.  So  the  number  of  new  rules  that  can  be  generated  by  some 
original  rule  in  G  is  finite.  So  step  3  can  execute  only  a  finite  number  of  times. 


EXAMPLE  11.17  Eliminating  c-Rules 

LetG  =  {{S,  T,  A,  B,  C,  a,b,  c},  {a,  b,  c},  R,S).  where: 

R  =  {S—  a  7a 
T-+ABC 
A->  aA  |  C 
B-+  Bb  l  C 
C-  c  |  e}. 

On  input  G,  removeEps  behaves  as  follows:  Step  2  finds  the  set  N  of  nullable 
variables  by  initially  setting  N  to  {C}.  On  its  first  pass  through  step  2.2  it  adds  A 
and  B  to  N.  On  the  next  pass,  it  adds  T  (since  now  A,  B.  and  C  are  all  in  N).  On  the 
next  pass,  no  new  elements  are  found,  so  step  2  halts  with  N  =  { C.  A,  B,T).  Step  3 
adds  the  following  new  rules  to  G': 


S— ►  aa 

f*  Since  T  is  nullable. 

T  — *  BC 

t*  Since  A  is  nullable. 

T—*AC 

1*  Since  B  is  nullable. 

T  —*  AB 

1*  Since  C  is  nullable. 

T—C 

1*  From  T  —*  BC ,  since  B  is  nullable.  Or  from 
T  — >  AC. 

T—*B 

1*  FromT  — *  BC,  since  C  is  nullable.  Or  from 
T—AB. 
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EXAMPLE  11.17  ( Continued) 

T-*A  p  FromT— »/tC,  since  C  is  nullable.  Or  from 

T—*AB. 

A—*  a  /*  Since  A  is  nullable. 

B—>  b  I*  Since  B  is  nullable. 

Finally,  step  4  deletes  the  rule  C— » s. 


Sometimes  L(G)  contains  e  and  it  is  important  to  retain  it. To  handle  this  case,  we 
present  the  following  algorithm,  which  constructs  a  new  grammar  G".  such  that 
L  (G")  =  L  (G).  If  L(G)  contains  e,  then  G"  will  contain  a  single  orule  that  can  be 
thought  of  as  being  "quarantined".  Its  sole  job  is  to  generate  the  string  e.  It  can  have  no 
interaction  with  the  other  rules  of  the  grammar. 

atmostoneEps  (G:  CFG)  — 

1.  G"  =  remove Eps  (G). 

2.  If  Sc  is  nullable  then:  /*  This  means  that  e  e  L  (G). 

2.1.  Create  in  G"  a  new  start  symbol  S*. 

2.2.  Add  to  Rc-  the  two  rules:  S  *  -*  e  and  S  *  — *  S(!. 

3.  Return  G". 


EXAMPLE  11.18  Eliminating  t-Rules  from  the  Balanced  Parens  Grammar 

We  again  consider  Bal  =  {we.  {),(}*  :  the  parentheses  are  balanced)  and  the 
grammar  G  =  {{S, ),  (},  {),  (},  /?,  5),  where: 


R  =  {S-+(S) 

0) 

S  —  SS 

(2) 

S  —*  e}. 

(3) 

We  would  like  to  eliminate  the  ambiguity  in  G.  Since  eeL(G),  we  call 
atmostoneEps(G),  which  begins  by  applying  removeEps  to  G: 

•  In  step  2,  N  =  {5}. 

•  In  step  3,  rule  (1)  causes  us  to  add  the  rule  S  —*•  ().  Rule  (2)  causes  us  to  con¬ 
sider  adding  the  rule  5  — *  5,  but  we  omit  adding  rules  whose  right-hand  sides 
and  left-hand  sides  are  the  same. 

•  In  step  4,  we  delete  the  rule  S  -*  e. 

So  remove  Eps{G)  returns  the  grammar  G'  =  { {S, ),(},{),(},  R.  5),  where  R  = 
0 

S-*SS). 
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In  its  step  2,  uimvstuneEps  creates  the  new  start  symbol  S*.  In  step  3,  it  adds 
the  two  rules  S*—*b,S*—*S.  So  atmostoneEps  returns  the  grammar  G *  = 

where: 

R  =  {S*  —  e 
S*-*S 
S-(S) 

s-*0 

S-+SS). 

The  string  (())()  has  only  one  parse  in  G ". 


Eliminating  Symmetric  Recursive  Rules 

The  new  grammar  that  we  just  built  for  Bal  is  better  than  our  original  one.  But  it  is  still 
ambiguous.  The  string  ()()()  has  two  parses,  shown  in  Figure  11.1. The  problem  now  is 
the  rule  S  —*  SS ,  which  must  be  applied  n  —  1  times  to  generate  a  sequence  of  n  bal¬ 
anced  parentheses  substrings.  But,  at  each  time  after  the  first,  there  is  a  choice  of  which 
existing  5  to  split. 


S* 


S' 


(  ) 


s 


(  )  (  >  (  ) 


FIGURE  11.1  T\vo  parse  trees  for  the 
string  ()()(). 


The  solution  to  this  problem  is  to  rewrite  the  grammar  so  that  there  is  no  longer  a 
choice.  We  replace  the  rule  S—*SS  with  one  of  the  following  rules: 

S  -*  SS)  I*  force  branching  to  the  left. 

5  —*  S\S  I*  force  branching  to  the  right. 

Then  we  add  the  rule  S  -+  S,  and  replace  the  rules  S  ->  (S)  and  5-*()  with  the  rules 

— *  (5)  and  S\-*Q.  What  we  have  done  is  to  change  the  grammar  so  that  branching  can 
occur  only  in  one  direction.  Every  S  that  is  generated  can  branch,  but  no  S)  can.  When  all 
the  branching  has  happened,  S  rewrites  to  Sj  and  the  rest  of  the  derivation  can  occur. 

So  one  unambiguous  grammar  for  Bal  is  G  =  { {S, ),  (},  {),  (},  R,  S),  where: 


R  =  {S*—*e 

(i) 

S*-*S 

(2) 

S—*SS) 

(3) 

S-+S) 

(4) 

Si-(S) 

(5) 

(f.\ 

I*  Force  branching  to  the  left. 
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The  technique  that  we  just  used  for  Bal  is  useful  in  any  situation  in  which  ambi¬ 
guity  arises  from  a  recursive  rule  whose  right-hand  side  contains  two  or  more 
copies  of  the  left-hand  side.  An  important  application  of  this  idea  is  to  expression 
languages,  like  the  language  of  arithmetic  expressions  that  we  introduced  in 
Example  11.14. 


EXAMPLE  11.19  An  Unambiguous  Expression  Grammar 

Consider  again  the  language  £xpr,  which  we  defined  with  the  following  context- 
free  grammar  G  =  { {£.  id.  +,  *,  (. )}.  {id.  +,  *.  (. )},  R.  £}.  where: 

R  =  {£—*£+-  £ 

£—►£*£ 

£—*(£) 

£~*  id  }. 

G  is  ambiguous  in  two  ways: 

1.  It  fails  to  specify  associativity.  So.  for  example,  there  are  two  parses  for  the 
string  id  +  id  +  i d.  corresponding  to  the  bracketings  (id  +  id)  +  id  and 
id  +  (id  +  id). 

2.  It  fails  to  define  a  precedence  hierarchy  for  the  operators  +  and  *.  So,  for  ex¬ 
ample.  there  are  two  parses  for  the  string  id  +  id  *  id, corresponding  to  the 
bracketings  (id  +  id)  *  id  and  id  +  (id  *  id). 

The  first  of  these  problems  is  analogous  to  the  one  we  just  solved  for  Bal.  We 
could  apply  that  solution  here,  but  then  we  d  still  have  the  second  problem.  We 
can  solve  both  of  them  with  the  following  grammar  O'  =  {{£,  T,  £.id, 
+,  *,  (, )}.  (id,  +,  *,  (, )},  R .  £}.  where: 

R  =  {£-*£  +  T 
E—*T 
T-*T*  F 
T  —*  F 
F—*(E) 

£  — id}. 

Just  as  we  did  for  Bal.  we  have  forced  branching  to  go  in  a  single  direction  (to 
the  left)  when  identical  operators  are  involved.  And.  by  adding  the  levels  T  (for 
term)  and  £  (for  factor)  we  have  defined  a  precedence  hierarchy:  Times  has 
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higher  precedence  than  plus  does.  Using  G',  there  is  now  a  single  parse  for  the 
string  i  d  +  i  d  *  i  d: 


£ 


Ambiguous  Attachment 

Hie  third  source  of  ambiguity  that  we  will  consider  arises  when  constructs  with  option¬ 
al  fragments  are  nested.  The  problem  in  such  cases  is  then,  “Given  an  instance  of  the 
optional  fragment,  at  what  level  of  the  parse  tree  should  it  be  attached?*’ 

Probably  the  most  often  described  instance  of  this  kind  of  ambiguity  is  known  as  the 
dangling  else  problem.  Suppose  that  we  define  a  programming  language  with  an  if 
statement  that  can  have  either  of  the  following  forms: 

<stmt>  ::=  if  <cond>  then  <stmt> 

<stmt>  ::=  i  f  <cond>  then  <stmt>  el  se  <stmt> 

In  other  words,  the  el  se  clause  is  optional. Then  the  following  statement,  with  just  a 
single  el  se  clause,  has  two  parses: 

i  f  com! |  then  i  f  coin! 2  then  st\  el  se  st2 

In  the  first  parse,  the  single  el  se  clause  goes  with  the  first  i  f.  (So  it  attaches  high  in 
the  parse  tree.)  In  the  second  parse,  the  single  el  se  clause  goes  with  the  second  i  f .  (In 
this  case,  it  attaches  lower  in  the  parse  tree.) 


EXAMPLE  11.20  The  Dangling  Else  Problem  in  Java 

Most  programming  languages  that  have  the  dangling  else  problem  (including  C, 
C++,  and  Java)  specify  that  each  el  se  goes  with  the  innermost  i  f  to  which  it  can 
be  attached.  The  Java  grammar  forces  this  to  happen  by  changing  the  rules  to 
something  like  these  (presented  here  in  a  simplified  form  that  omits  many  of  the 
statement  types  that  are  allowed): 

<Slatemcnt>  ::=  <HThenStatement>  |  <HThenElseStatement>  | 
<IfThenElseStatementNoShorlIf>  | ... 

<StatementNoShortlf>  ::=  <block>  |  <IfThcnElseStatementNoShortIf>  |  ... 
<iniienSlalement>  ::=  i  f  (  <Expression>  )  <Stalement> 
<IIThenElscSlatement>  ::=  i f  (  <Exprcssion>  )  <StatementNoShortIf>  else 
<Statemenf> 
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EXAMPLE  11.20  ( Continued) 

<l(ThenElscSlatcmenlNoShorllf>  ::=  if  (  <  Ex  press  ion  >  ) 

<StatementNoShorlir>  else  <SlatemeniNoShortlf> 

In  this  grammar,  there  is  a  special  class  of  statements  called  <Statement 
NoShortlf>.  These  are  statements  that  are  guaranteed  not  to  end  with  a  short 
(i.e..  else-less  if  statement). The  grammar  uses  this  class  to  guarantee  that,  if  a 
top-level  i  f  statement  has  an  el  se  clause,  then  any  embedded  i  f  must  ulso  have 
one.  To  see  how  this  works,  consider  the  following  parse  tree: 


<Statcmenl> 


<irrhenF.lseStalcmeni> 


if  (cnnd)  <StalemenlNnShorilf>  else  Statement  > 


The  top-level  if  statement  claims  the  else  clause  lor  itself  hv  guaranteeing 
that  there  will  not  be  an  embedded  i  f  that  is  missing  an  el  se.  If  there  were,  then 
that  embedded  i  f  would  grab  the  one  el  se  clause  there  is. 


For  a  discussion  of  other  ways  in  which  programming  languages  can  solve 
this  problem,  see  G.3. 


Attachment  ambiguity  is  also  a  problem  for  parsers  for  natural  languages  such  as 
English,  as  we'll  see  in  Example  1 1.22 


Proving  that  a  Grammar  is  Unambiguous 

While  it  is  undccidablc.  in  general,  whether  a  grammar  is  ambiguous  or  unambiguous, it 
may  be  possible  to  prove  that  a  particular  grammar  is  either  ambiguous  or  unambigu¬ 
ous.  A  grammar  G  can  be  shown  to  be  ambiguous  by  exhibiting  a  single  string  for 
which  G  produces  two  parse  trees.  To  see  how  it  might  be  possible  to  prove  that  G  is 
unambiguous,  recall  that  G  is  unambiguous  iff  every  string  derivable  in  C»  has  a  single 
leftmost  derivation.  So.  if  we  can  show-  that. during  any  leftmost  derivation  of  any  string 
tre  L  (G).  exactly  one  rule  can  be  applied,  then  G  is  unambiguous. 


EXAMPLE  11.21  The  Final  Balanced  Parens  Grammar  is  Unambiguous 

We  return  to  the  final  grammar  G  that  we  produced  for  Hal.  G  =  {{$.).(},{), 
(},/?,  S),  where: 
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R  =  {S*-+e  (1) 

S*-»S  (2) 

S  —  SS,  (3) 

S-S,  (4) 

SX-*(S)  (5) 

$,-()}.  (6) 

We  prove  that  G  is  unambiguous.  Given  the  leftmost  derivation  of  any  string 
w  in  L(G ),  there  is.  at  each  step  of  the  derivation,  a  unique  symbol,  which  we*U 
call  X .  that  is  the  leftmost  nonterminal  in  the  working  string.  Whatever  X  is,  it 
must  be  expanded  by  the  next  rule  application,  so  the  only  rules  that  may  be 
applied  next  are  those  with  X  on  the  left-hand  side. There  are  three  nontermi¬ 
nals  in  G.  We  show,  for  each  of  them,  that  the  rules  that  expand  them  never 
compete  in  the  leftmost  derivation  of  a  particular  string  w.  We  do  the  two  easy 
cases  first: 

•  S*:Thc  only  place  that  S*  may  occur  in  a  derivation  is  at  the  beginning.  If  w  =  e. 
then  rule  ( 1 )  is  the  only  one  that  can  be  applied.  If  w  *■  e,  then  rule  (2)  is  the  only 
one  that  can  be  applied. 

•  5|*.  If  the  next  two  characters  to  be  derived  are  (),  S,  must  expand  by  rule  (6). 
Otherwise,  it  must  expand  by  rule  (5). 

In  order  discuss  S,  we  first  define,  for  any  matched  set  of  parentheses  w,  the 
siblings  of  m  to  be  the  smallest  set  that  includes  any  matched  set  p  adjacent,  on 
the  right,  to  m  and  all  of  p's  siblings.  So,  for  example,  consider  the  string: 

(UoV)() 

5 

The  set  ()  labeled  I  has  a  single  sibling,  2. The  set  (()())  labeled  5  has  two  sib¬ 
lings,  3  and  4.  Now  we  can  consider  S.  We  observe  that: 

•  S  must  generate  a  string  in  Bal  and  so  it  must  generate  a  matched  set,  possibly 
with  siblings. 

•  So  the  first  terminal  character  in  any  string  that  S  generates  is  (.  Call  the  string 
that  starts  with  that  (  and  ends  with  the  )  that  matches  it,.v. 

•  The  only  thing  that  can  generate  is  a  single  matched  set  of  parentheses  that 
has  no  siblings. 

•  Let  n  be  the  number  of  siblings  of  s.  In  order  to  generate  those  siblings.  5  must 
expand  by  rule  (3)  exactly  n  times  (producing n  copies  of  S|)  before  it  expands 
by  rule  (4)  to  produce  a  single  Sj,  which  will  produce  s.  So,  at  every  step  in  a 
derivation,  let  p  be  the  number  of  occurrences  of  S,  to  the  right  of  S.  If  p  <  n, 
S  must  expand  by  rule  (3).  Up  =  n,  S  must  expand  by  rule  (4). 
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Going  Too  Far 

We  must  be  careful,  in  getting  rid  of  ambiguity,  that  we  don't  do  so  at  the  expense  of 
being  able  to  generate  the  parse  trees  that  we  want.  In  both  the  arithmetic  expression 
example  and  the  dangling  else  case,  we  were  willing  to  force  one  interpretation.  Some¬ 
times,  however,  that  is  not  an  acceptable  solution. 

EXAMPLE  11,22  Throwing  Away  The  Parses  That  We  Want 

Let’s  return  to  the  small  F.nglish  grammar  that  we  showed  in  Example  1 1.6.  That 
grammar  is  ambiguous.  It  has  an  ambiguous  attachment  problem,  similar  to  the 
dangling  else  problem.  Consider  the  following  two  sentences: 

Chris  likes  the  girl  with  a  cat. 

Chris  shot  the  bear  with  a  rifle. 

Each  of  these  sentences  has  two  parse  trees  because,  in  each  case,  the  preposi¬ 
tional  phrase  with  a  N,  can  be  attached  either  to  the  immediately  preceding  A IP 
(the  gi  rl  or  the  bear)  or  to  the  VP. The  correct  interpretation  for  the  first  sen¬ 
tence  is  that  there  is  a  girl  with  a  cat  and  Chris  likes  her.  In  other  words,  the  prepo¬ 
sitional  phrase  attaches  to  the  NP.  Almost  certainly,  the  correct  interpretation  for 
the  second  sentence  is  that  there  is  a  bear  (with  no  rifle)  and  Chris  used  a  rifle  to 
shoot  it.  In  other  words,  the  prepositional  phrase  attaches  to  the  VP.  See  L.3.4  for 
additional  discussion  of  this  example. 

For  now,  the  key  point  is  that  we  could  solve  the  ambiguity  problem  by  elimi¬ 
nating  one  of  the  choices  for  PP  attachment.  But  then,  for  one  of  our  two  sen¬ 
tences,  we’d  get  a  parse  tree  that  corresponds  to  nonsense.  In  other  words,  we 
might  still  have  a  grammai  with  the  required  weak  generative  capacity,  but  we 
would  no  longer  have  one  with  the  required  strong  generative  capacity.  The  solu¬ 
tion  to  this  problem  is  to  add  some  additional  mechanism  to  the  context-free 
framework. That  mechanism  must  be  able  to  choose  the  parse  that  corresponds  to 
the  most  likely  meaning. 


English  parsers  must  have  ways  to  handle  various  kinds  of  attachment  am¬ 
biguities,  including  those  caused  by  prepositional  phrases  and  relative 
clauses.  (L.3.4) 


11.8  Normal  Forms  * 

So  far,  we’ve  imposed  no  restrictions  on  the  form  of  the  right-hand  sides  of  our  gram¬ 
mar  rules,  although  we  have  seen  that  some  kinds  of  rules,  like  those  whose  right-hand 
side  is  e,  can  make  grammars  harder  to  use.  In  this  section,  we  consider  what  happens 
if  we  carry  the  idea  of  getting  rid  of  e-productions  a  few  steps  farther. 
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Normal  forms  for  queries  and  data  can  simplify  database  processing.  (H.5) 
Normal  forms  for  logical  formulas  can  simplify  automated  reasoning  in  arti¬ 
ficial  intelligence  systems  (M.2)  and  in  program  verification  systems.  (H.1.1) 


Let  C  be  any  set  of  data  objects.  For  example,  C  might  be  the  set  of  context-free 
grammars.  Or  it  could  be  the  set  of  syntactically  valid  logical  expressions  or  a  set  of 
database  queries.  We'll  say  that  a  set  F  is  a  normal  form  for  C  iff  it  possesses  the  follow¬ 
ing  two  properties: 

•  For  every  element  c  of  C,  except  possibly  a  finite  set  of  special  cases,  there  exists 
some  element  f  of  Fsuch  that /is  equivalent  to  c  with  respect  to  some  set  of  tasks. 

•  Fis  simpler  than  the  original  form  in  which  the  elements  of  C  are  written.  By  “sim¬ 
pler”  we  mean  that  at  least  some  tasks  are  easier  to  perform  on  elements  of  F  than 
they  would  be  on  elements  of  C. 

We  define  normal  forms  in  order  to  make  other  tasks  easier.  For  example,  it  might 
be  easier  to  build  a  parser  if  we  could  make  some  assumptions  about  the  form  of  the 
grammar  rules  that  the  parser  will  use.  Recall  that,  in  Section  5.8,  we  introduced  the 
notion  of  a  canonical  form  for  a  set  of  objects.  A  normal  form  is  a  weaker  notion,  since 
it  does  not  require  that  there  be  a  unique  representation  for  each  object  in  C,  nor  does 
it  require  that  “equivalent”  objects  map  to  the  same  representation.  So  it  is  sometimes 
possible  to  define  useful  normal  forms  when  no  useful  canonical  form  exists.  We’ll  now 
do  that  for  context-free  grammars. 


1 1 .8.1  Normal  Forms  for  Grammars 

We’ll  define  the  following  two  useful  normal  forms  for  context-free  grammars: 

•  Chomsky  Normal  Form :  In  a  Chomsky  normal  form  grammar  G  =  (V,  2,  R,  5), 
all  rules  have  one  of  the  following  two  forms: 

•  X-*a,  where  a  e  2,  or 

•  X—*  BC,  where  B  and  C  are  elements  of  V  —  2. 

Every  parse  tree  that  is  generated  by  a  grammar  in  Chomsky  normal  form  has  a 
branching  factor  of  exactly  2,  except  at  the  branches  that  lead  to  the  terminal 
nodes,  where  the  branching  factor  is  l.This  property  makes  Chomsky  normal  form 
grammars  useful  in  several  ways,  including: 

•  Parsers  can  exploit  efficient  data  structures  for  storing  and  manipulating  binary 
trees. 

•  Every  derivation  of  a  string  w  contains  |w;|  —  1  applications  of  some  rule  of  the 
form  X-*BC,  and  |iu|  applications  of  some  rule  of  the  form  X—*a.  So  it  is 
straightforward  to  define  a  decision  procedure  to  determine  whether  w  can  be 
generated  by  a  Chomsky  normal  form  grammar  G. 
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In  addition,  because  the  form  of  all  the  rules  is  so  restricted,  it  is  easier  than  it 
would  otherwise  be  to  define  other  algorithms  that  manipulate  grammars. 

•  Greibach  Normal  Form:  In  a  Greibach  normal  form  grammar  G  =  ( V .  5). 

all  rules  have  the  following  form: 

•  X—*  a  1 3.  where  a  el  and  fie  (V  —  -  )*. 

In  every  derivation  that  is  produced  by  a  grammai  in  Greibach  normal  form,  pre¬ 
cisely  one  terminal  is  generated  for  each  rule  application. This  property  is  useful  in 
several  ways,  including: 

•  Every  derivation  of  a  siring  ir  contains  |ir|  rule  applications.  So  again  it  is 
straightforward  to  define  a  decision  procedure  to  determine  whether  w  can  be 
generated  by  a  Greibach  normal  form  grammar  (l. 

•  As  we'll  sec  in  Theorem  14.2.  Greibach  normal  form  grammars  can  easily  be 
converted  to  pushdowm  automata  with  no  K-tranxilions.  litis  is  useful  because 
such  PDAs  are  guaranteed  to  halt. 

THEOREM  11.1  Chomsky  Normal  Form  _ 

Theorem:  Given  a  context-free  grammar  (7.  there  exists  a  Chomsky  normal  form 
grammar  G(  such  that  L  (6< )  =  L(Gc)  -  {«}. 

Proof:  The  proof  is  by  construction,  using  the  algorithm  comrrttoChonisky  pre¬ 
sented  below. 

THEOREM  11.2  Greibach  Normal  Form  _ 

Theorem:  Given  a  context-free  grammar  G.  there  exists  a  Greibach  normal  form 
grammar  G(j  such  that  L  ( G(i)  —  L[G)  —  {»;). 

Proof:  The  proof  is  also  by  construction.  We  present  it  in  I XI . 

11.8.2  Converting  to  a  Normal  Form 

Normal  forms  are  useful  if  there  exists  a  procedure  for  converting  an  arbitrary  object 
into  a  corresponding  object  that  meets  the  requirements  of  the  normal  form.  Algo¬ 
rithms  to  convert  grammars  into  normal  forms  generally  begin  with  a  grammar  G  and 
then  operate  in  a  series  of  steps  as  follows: 

1.  Apply  some  transformation  to  G  to  gel  rid  of  undesirable  property  1.  Show  that 
the  language  generated  by  G  is  unchanged. 

2.  Apply  another  transformation  to  G  to  get  rid  of  undesirable  property  2.  Show 
that  the  language  generated  by  G  is  unchanged  and  that  undesirable  property  l 
has  not  been  reintroduced. 

3.  Continue  until  the  grammar  is  in  the  desired  form. 

Because  it  is  possible  for  one  transformation  to  undo  the  work  of  an  earlier  one,  the 
order  in  which  the  transformation  steps  are  performed  is  often  critical  to  the  correct¬ 
ness  of  the  transformation  algorithm. 
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One  transformation  that  we  will  exploit  in  converting  grammars  both  to  Chomsky 
normal  form  and  to  Greibach  normal  form  is  based  on  the  following  observation.  Con¬ 
sider  a  grammar  that  contains  the  three  rules: 

A'— »  aVc 

V—  b 

y-»zz 

We  can  construct  an  equivalent  grammar  by  replacing  the  X  rule  with  the  rules: 

A'  — abc 
X  —*  aZZc 

Instead  of  letting  X  generate  an  instance  of  Y,X  immediately  generates  whatever  Y 
could  have  generated. The  following  theorem  generalizes  this  claim. 

THEOREM  11.3  Rule  Substitution  _ 

Theorem:  Let  G  =  (V.  2.  /?,  5)  be  a  context-free  grammar  that  contains  a  rule  r  of 
the  form  X—*aYp .  where  a  and  /3  are  elements  of  V*  and  Ve  (V  -  2).  Let 
Y  -*  yily^l  • . .  Iy„  be  all  of  G's  rules  whose  left-hand  side  is  Y.  And  let  G'  be  the 
result  of  removing  from  R  the  rule  r  and  replacing  it  by  the  rules 
X  — *  ayi/3.  X  —* ay2p . X~*  ay, ,(3. Then  L  (G’)  =  L  (G). 

Proof:  We  first  show  that  every  string  in  L(G)  is  also  in  L  (G'):  Suppose  that  w  is  in 
L(G).  If  G  can  derive  w  without  using  rule  r,  then  G'  can  do  so  in  exactly  the 
same  way.  If  G  can  derive  w  using  rule  r.  then  one  of  its  derivations  has  the  fol¬ 
lowing  form,  for  some  value  of  k  between  1  and  n: 

S  =>  ...  => SX<f>  =*  ScxYl 3<f>  =» SayA/3«£  =*...=>  w. 

Then  C  can  derive  w  with  the  derivation: 

S  =>  ...  =>  8Xfj>  =>  8aykp<f)  =>  ...  =>  w. 

Next  we  show  that  only  strings  in  L(G)  can  be  in  L(G’).  This  must  be  so  be¬ 
cause  the  action  of  every  new  rule  A'— *  ayAjB  could  have  been  performed  in  G  by 
applying  the  rule  X  -*aYf3  and  then  the  rule  V-y*. 

11.8.3  Converting  to  Chomsky  Normal  Form 

There  exists  a  straightforward  four-step  algorithm  that  converts  a  grammar 
G  =  (V.  S,  /?.  S)  into  a  new  grammar  Gt  such  that  Gc-  is  in  Chomsky  normal  form  and 
L  (Gc)  -  L  (G)  -  {e}.  Define: 

converuoChnmsky(G :  CFG)  = 

1.  Let  Gc  be  the  result  of  removing  from  G  all  e-rules,  using  the  algorithm 
remove Eps,  defined  in  Section  1 1.7.4. 

2.  Let  G(  be  the  result  of  removing  from  Gc  all  unit  productions  (rules  of  the 

form  A  —*  B ),  using  the  algorithm  removeUnits  defined  below.  It  is  important 
that  reinovel Initv  run  r' 
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productions.  Once  this  step  has  been  completed,  all  rules  whose  right-hand 
sides  have  length  1  are  in  Chomsky  normal  form  (i.e.,  they  are  composed  of  a 
single  terminal  symbol). 

3.  Let  Gc  be  the  result  of  removing  from  Gc  all  rules  whose  right-hand  sides 
have  length  greater  than  1  and  include  a  terminal  (e.g.,  A  — *  afl  or  A  — 1 *  flaC). 
This  step  is  simple  and  can  be  performed  by  the  algorithm  removeMixed  given 
below.  Once  this  step  has  been  completed,  all  rules  whose  right-hand  sides 
have  length  1  or  2  are  in  Chomsky  normal  form. 

4.  Let  Gc  be  the  result  of  removing  from  Gc  all  rules  whose  right-hand  sides 
have  length  greater  than  2  (e.g..  A  —*  BCDE).  This  step  too  is  simple.  It  can  be 
performed  by  the  algorithm  removeLong  given  below. 

5.  Return  Gc. 

A  unit  production  is  a  rule  whose  right-hand  side  consists  of  a  single  nonterminal 
symbol.  The  job  of  remove  Units  is  to  remove  all  unit  productions  and  to  replace  them  by 
a  set  of  other  rules  that  accomplish  the  job  previously  done  by  the  unit  productions.  So. 
for  example,  suppose  that  we  start  with  a  grammar  G  that  contains  the  following  rules: 

X  —*  A 

A—*  B  |  a 

B-*  b 

Once  we  get  rid  of  unit  productions,  it  will  no  longer  be  possible  for  X  to  become  A 
(and  then  B)  and  thus  to  go  on  to  generate  a  or  b.  So  X  will  need  the  ability  to  go  directly 
to  a  and  b,  without  any  intermediate  steps.  We  can  define  removeUnits  as  follows: 

removeUnits(G\  CFG)  = 

1.  Let  G'  =  G. 

2.  Until  no  unit  productions  remain  in  G'  do: 

2.1.  Choose  some  unit  production  X—*Y. 

2.2.  Remove  it  from  G‘. 

2.3.  Consider  only  rules  that  still  remain  in  G‘.  For  every  rule  Y  — 1 *  0,  where 
0  e  V*,  do: 

Add  to  G'  the  rule  X—>  (3  unless  that  is  a  rule  that  has  already  been 
removed  once. 

3.  Return  G\ 

Notice  that  we  have  not  bothered  to  check  to  make  sure  that  we  don't  insert  a  rule 
that  is  already  present.  Since  R,  the  set  of  rules,  is  a  set.  inserting  an  element  that  is  al¬ 
ready  in  the  set  has  no  effect. 

At  each  step  of  its  operation,  removeUnits  is  performing  the  kind  of  rule  substitution 
described  in  Theorem  11.3.  (It  happens  that  both  a  and  0  are  empty.)  So  that  theorem 
tells  us  that,  at  each  step,  the  language  generated  by  G'  is  unchanged  from  the  previous 
step.  If  removeUnits  halts,  it  is  clear  that  all  unit  productions  have  been  removed.  It  is 
less  obvious  that  removeUnits  can  be  guaranteed  to  hall.  At  each  step,  one  unit  produc¬ 
tion  is  removed,  but  several  new  rules  may  be  added,  including  new  unit  productions. To 
see  that  removeUnit  must  halt,  we  observe  that  there  is  a  bound  =  \V  -  2  p  on  the 
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number  of  unit  productions  that  can  be  formed  from  a  fixed  set  V  -  2  of  nonterminals. 
At  each  step,  remove  Units  removes  one  element  from  that  set  and  that  element  can 
never  be  reinserted.  So  removeUnits  must  halt  in  at  most  |  V  —  2|2  steps. 


EXAMPLE  11.23  Removing  Unit  Productions 

Let  G  =  ( V ,  2,  R,  S),  where: 

R  =  {S-*AT 


B  -»  b 
Y-*T 
T-*Y  |  c}. 

- 

The  order  in  which  removeUnits  chooses  unit  productions  to  remove  doesn’t 
matter.  We’ll  consider  one  order  it  could  choose: 


Remove  X -* A.  Since  A—*B |  a, add X  —» B  |  a. 

Remove  X-*B.  Add  X  -*  b. 

Remove  Y—>T.  Add  Y  — 1 *  Y  |  c.  Notice  that  we’ve  added  Y—*Yt  which  is 

useless,  but  it  will  be  removed  later. 

Removey  —*Y.  Consider  adding  Y  — ►  T,  but  don’t  since  it  has  previously  been 
removed. 

Remove  A—*B.  Add  A  — ►  b. 

Remove  T—*Y.  Add  T-*  c,  but  with  no  effect  since  it  was  already  present. 

At  this  point,  the  rules  of  G  are: 

S^XY 
A-*  a  |  b 

T-+  c 
*->a|  b 

y->  c 

No  unit  productions  remain,  so  removeUnits  halts. 
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We  must  now  define  the  two  straightforward  algorithms  that  are  required  by  steps  3 
and  4  of  the  conversion  algorithm  that  we  sketched  above.  We  begin  by  defining: 


removeMixed  (G:  CFG)  = 

1.  Let  G'  =  G. 


2.  Create  a  new  nonterminal  T„  for  each  terminal  a  in  2. 

3.  Modify  each  rule  in  G'  whose  right-hand  side  has  length  greater  than  1  and  that 
contains  a  terminal  symbol  by  substituting  Ta  for  each  occurrence  of  the  terminal  a. 

4.  Add  to  G\  for  each  Ta,  the  rule  Ta~*  a. 

5.  Return  G\ 
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EXAMPLE  11.24  Removing  Mixed  Productions 

The  result  of  applying  removeMixed  to  the  grammar: 

A—*  a 
A->aB 
A^BaC 
A  — *  BbC 

is  the  grammar. 

A-*  a 
A  —*  TUB 
A  — •  BT0C 
A  —  BThC 
To-* 

Th~*  b 

Finally  we  define  remove  Long.  The  idea  for  remove  Long  is  simple.  If  there  is  a  rule 
with  n  symbols  on  its  right-hand  side,  replace  it  with  a  set  of  rules.  The  first  rule  gener¬ 
ates  the  first  symbol  followed  by  a  new  symbol  that  will  correspond  to  “the  rest”. The 
next  rule  rewrites  that  symbol  as  the  second  of  the  original  symbols,  followed  by  yet 
another  new  one,  again  corresponding  to  “the  rest",  and  so  forth,  until  there  are  only 
two  symbols  left  to  generate.  So  we  define: 

remove  Long  (G:  CFG)  = 

1.  Let  G'  —  G. 

2.  For  each  G'  rule  r  *  of  the  form  A  — *■  A/,/y,A^N4 . . .  N„,  n  >  2.  create  new  non¬ 
terminals  Mk2,  Mky...  Mkn-\. 

3.  In  G\  replace  rk  with  the  rule  A  —*  NxMk2. 

4.  To  G\  add  the  rules  Mk2—*  N2Mk2,  ...  A# v  —  Nn.xNn. 

5.  Return  G'. 

When  we  illustrate  this  algorithm,  we  typically  omit  the  superscripts  on  the  Af’s,and, 
instead,  guarantee  that  we  use  distinct  nonterminals  by  using  distinct  subscripts. 


EXAMPLE  11.25  Removing  Rules  with  Long  Right-hand  Sides 

The  result  of  applying  removeLong  to  the  single  rule  grammar: 

A  —  BCDEF 

is  the  grammar  with  rules: 

A  — *  BM2 
M2-*CMy 

M^EF 
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We  can  now  illustrate  the  four  steps  of  converttoChomsky. 


EXAMPLE  11.26  Converting  a  Grammar  to  Chomsky  Normal  Form 
Let  G  =  ({S,  A,  B,  C,  a,  c},  (A,  B,  C},  R ,  S),  where: 

R  =  {S  -*  aACa 
A-*B\ a 
B— *C  |  c 
C— >cC|e}. 

We  convert  G  to  Chomsky  normal  form.  Step  1  applies  removeEps  to  eliminate 
e-productions.  We  compute  N,  the  set  of  nullable  variables.  Initially  N  =  {C  } . 
Because  of  the  rule  B  — » C,  we  add  B.Then,  because  of  the  rule  A  — *  B,  we  add  A. 
So  N  =  {A,  B,  C  }.  Since  both  A  and  C  are  nullable,  we  derive  three  new  rules 
from  the  first  original  rule,  giving  us: 

S  —*  aACa  |  aAa  |  aCa  |  aa 

We  add  A-*e  and  B  -+  e,  but  both  of  them  will  disappear  at  the  end  of  this 
step.  We  also  add  C-*  c.  So  removeEps  returns  the  rule  set: 

S  — ►  aACa  |  aAa  |  aCa  |  aa 
A-*  B  |  a 
B— *C  |  c 
C-*  cC|  c 

Next  we  apply  removeUnits : 

Remove  A  — »■  B.  Add  A  — ►  C 1  c. 

Remove  B  — »  C.  Add  B  — ►  cC  (and  B  -*  c,  but  it  was  already  there). 
Remove  A-*C.  Add  A  — *  cC  (and  A  — *  c,  but  it  was  already  there). 

So  removeUnits  returns  the  rule  set: 

S  —*  aACa  |  aAa  |  aCa  |  aa 
A  -*■  a  |  c  |  cC 
B-*  c  |  cC 
C— *cC|  c 

Next  we  apply  removeMixed ,  which  returns  the  rule  set: 

S  -  TaACT„  |  TaATa  |  TaCTa  \  TaTa 

A->  a  |  c  |  TcC 

B->c\TcC 

C-+TcC\c 
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EXAMPLE  11.26  ( Continued) 

T0~*  a 
Te-*c 

Finally,  we  apply  remove  Long ,  which  returns  the  rule  set: 

S-7^,  S-TA  S-T&  S->T„Ta 

51- *AS2  S3->AT„  Si—*CTa 

52- >CTa 

A  ->  a  |  c  |  TcC 
B-*c  |  TcC 
C  —  TcC\c 
Ta~** 

Tc->c 


From  Example  1 1 .26  we  see  that  the  Chomsky  normal  form  version  of  a  grammar 
may  be  longer  than  the  original  grammar  was.  How  much  longer?  And  how  much  time 
may  be  required  to  execute  the  conversion  algorithm?  We  can  answer  both  of  these 
questions  by  answering  them  for  each  of  the  steps  that  the  conversion  algorithm  exe¬ 
cutes.  Let  n  be  the  length  of  an  original  grammar  G.Then  we  have: 

1.  Use  removeEps  to  remove  E-rules:  Suppose  that  CJ  contains  a  rule  of  the  form 
X~*  A\A2A3...Ak.  If  all  of  the  variables  A\  through  Ak  are  nullable,  this  single 
rule  will  be  rewritten  as  2*-l  rules  (since  each  of  the  k  nonterminals  can  either 
be  present  or  not.  except  that  they  cannot  all  be  absent).  Since  k  can  grow  as  /i. 
we  have  that  the  length  of  the  grammar  that  remove F.ps  produces  (and  thus  the 
amount  of  time  that  removeEps  requires)  is  O  (2")  In  this  worst  case,  the  con¬ 
version  algorithm  becomes  impractical  for  all  but  toy  grammars.  We  can  prevent 
this  worst  case  from  occurring  though.  Suppose  that  all  right-hand  sides  can  be 
guaranteed  to  be  short.  For  example,  suppose  they  all  have  length  at  most  2. 
Then  no  rule  will  be  rewritten  as  more  than  3  rules.  We  can  make  this  guarantee 
if  we  modify  converttoChomsky  slightly.  We  will  run  remove  Long  as  step  1 
rather  than  as  step  4.  Note  that  none  of  the  other  steps  can  create  a  rule  whose 
right-hand  side  is  longer  than  the  right-hand  side  of  some  rule  that  already  ex¬ 
ists.  So  it  is  not  necessary  to  rerun  remove  Long  later.  With  this  change, 
removeEps  runs  in  linear  time. 

2.  Use  removeUnits  to  remove  unit  productions:  We've  already  shown  that  this  step 
must  halt  in  at  most  |  V  -  2|1 2 3  steps.  Each  of  those  steps  takes  constant  lime  and 
may  create  one  new  rule.  So  the  length  of  the  grammar  that  removeUnits  pro¬ 
duces,  as  well  as  the  time  required  for  it  to  run,  is  O  (n2). 

3.  Use  removeMixed  to  remove  rules  with  right-hand  sides  of  length  greater  than  l 
and  that  contain  a  terminal  symbol: This  step  runs  in  linear  time  and  constructs  a 
grammar  whose  size  grows  linearly. 
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4.  Use  removeLong  to  remove  rules  with  long  right-hand  sides: This  step  runs  in  lin¬ 
ear  time  and  constructs  a  grammar  whose  size  grows  linearly. 

So.  if  we  change  converttoChomsky  so  that  it  does  step  4  first,  its  time  complexity  is 
O  (;i2)  and  the  size  of  the  grammar  that  it  produces  is  also  O  (n2). 

8.4  The  Price  of  Normal  Forms 

While  normal  forms  are  useful  for  many  things,  as  we  will  see  over  the  next  few  chap¬ 
ters,  it  is  important  to  keep  in  mind  that  they  exact  a  price  and  it’s  one  that  we  may  or 
may  not  be  willing  to  pay.  depending  on  the  application.  If  G  is  an  arbitrary  context-free 
grammar  and  G'  is  an  equivalent  grammar  in  Chomsky  (or  Greibach)  normal  form, 
then  G  and  G'  generate  the  same  set  of  strings,  but  only  in  rare  cases  (for  example  if  G 
happened  already  to  be  in  normal  form)  do  they  assign  to  those  strings  the  same  parse 
trees. Thus,  while  converting  a  grammar  to  a  normal  form  has  no  effect  on  its  weak  gen¬ 
erative  capacity,  it  may  have  a  significant  effect  on  its  strong  generative  capacity. 
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Suppose  that  we  want  to  parse  strings  that  possess  one  or  more  of  the  following 

properties: 

•  Some  (perhaps  many)  of  them  are  ill-formed.  In  other  words,  while  there  may  be  a 
grammar  that  describes  what  strings  are  “supposed  to  look  like”,  there  is  no  guar¬ 
antee  that  the  actual  strings  we’ll  see  conform  to  those  rules.  Consider,  for  example, 
any  grammar  you  can  imagine  for  English.  Now  imagine  picking  up  the  phone  and 
hearing  something  like.  “Um.  1  uh  need  a  copy  of  uh  my  bill  for  er  Ap,  no  May,  I 
think,  or  June,  maybe  all  of  them  uh.  1  guess  that  would  work."  Or  consider  a  gram¬ 
mar  for  HTML.  It  will  require  that  tags  be  properly  nested.  But  strings  like 
<bxi>bold  italic</bx/i>  show  up  not  infrequently  in  HTML  documents. 
Most  browsers  will  do  the  right  thing  with  them,  so  they  never  get  debugged. 

•  We  simply  don’t  know  enough  about  them  to  build  an  exact  model,  although  we  do 
know  something  about  some  patterns  that  we  think  the  strings  will  contain. 

•  They  may  contain  substrings  in  more  than  one  language.  For  example,  bi(multi)lin- 
gual  people  often  mix  their  speech.  We  even  give  names  to  some  of  the  resulting  hy¬ 
brids:  Spanglish,  Japlish,  Hinglish.  etc.  Or  consider  a  typical  Web  page.  It  may 
contain  fragments  of  HTML,  Java  script,  or  other  languages,  interleaved  with  each 
other.  Even  when  parsing  strings  that  are  all  in  the  same  “language”,  dialectical  is¬ 
sues  may  arise.  For  example,  in  response  to  the  question,  “Are  you  going  to  fix  din¬ 
ner  tonight?”  an  American  speaker  of  English  might  say,  “I  could,”  while  a  British 
speaker  of  English  might  say,  “I  could  do.”  Similarly,  in  analyzing  legacy  software, 
there  are  countless  dialects  of  languages  like  Fortran  and  Cobol. 

•  They  may  contain  some  substrings  we  care  about,  interleaved  with  other  substrings 
we  don  t  care  about  and  don't  want  to  waste  time  parsing.  For  example,  when  pars¬ 
ing  an  XML  document  to  determine  its  top  level  structure,  we  may  have  no  interest 
in  the  text  or  even  in  many  of  the  tags. 
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Island  grammars  can  play  a  useful  role  in  reverse  engineering  software  sys¬ 
tems.  (H.4.2) 


In  all  of  these  cases,  the  role  of  any  grammar  we  might  build  is  different  than  the 
role  a  grammar  plays,  say,  in  a  compiler.  In  the  latter  case,  the  grammar  is  prescriptive. 

A  compiler  can  simply  reject  inputs  that  do  not  conform  to  the  grammar  it  is  given. 
Contrast  that  with  a  tool  whose  job  is  to  analyze  legacy  software  or  handle  customer 
phone  calls.  Such  a  tool  must  do  the  best  it  can  with  the  input  that  it  sees.  When  build¬ 
ing  tools  of  that  sort,  it  may  make  sense  to  exploit  what  is  culled  an  island  grammar.  An 
island  grammar  is  a  grammar  that  has  two  parts: 

•  A  set  of  detailed  rules  that  describe  the  fragments  that  we  care  about.  We’ll  call 
these  fragments  islands. 

•  A  set  of  flexible  rules  that  can  match  everything  else.  We'll  call  everything  else  the 
water. 

A  very  simple  form  of  island  grammar  is  a  regular  expression  that  just  describes  the 
patterns  that  we  seek.  A  regular  expression  matcher  ignores  those  parts  of  the  input 
string  that  do  not  match  the  patterns.  But  suppose  that  the  patterns  we  are  looking  for 
cannot  be  described  with  regular  expressions.  For  example,  they  may  require  balanced 
parentheses.  Or  suppose  that  we  want  to  assign  structure  to  the  islands.  In  that  case,  we 
need  something  more  powerful  than  a  regular  expression  (or  a  regular  grammar).  One 
way  to  view  a  context-free  island  grammar  is  that  it  is  a  hybrid  between  a  context-free 
grammar  and  a  set  of  regular  expressions. 

To  see  how  island  grammars  work,  consider  the  problem  of  examining  legacy  soft¬ 
ware  to  determine  patterns  of  static  subroutine  invocation.  To  solve  this  problem,  we 
could  use  the  following  island  grammar,  which  is  a  simplification  and  modification  of 
one  presented  in  [Moonen  2001): 


m 

<input>  — *■  <chunk>* 

(21 

<chunk>  — *  CALL  <id>  (<expr>) 

(cons(CALL)) 

pi 

<chunk>  — *  CALL  ERROR  (<expr>) 

(reject) 

[41 

<chunk>  — *  <waler> 

[51 

<water>  — *  2* 

(avoid ) 

Rule  1  says  that  a  complete  input  file  is  a  set  of  chunks. The  next  three  rules  describe 
three  kinds  of  chunks: 

•  Rule  2  describes  the  chunks  we  are  trying  to  find.  Assume  that  another  set  of 
rules  (such  as  the  ones  we  considered  in  Example  11.19)  defines  the  valid  syn¬ 
tax  for  expressions.  Those  rules  may  exploit  the  full  power  of  a  context-free 
grammar,  for  example  to  guarantee  that  parenthesized  expressions  are  proper¬ 
ly  nested.  Then  rule  2  will  find  well-formed  function  calls.  The  action  associated 
with  it,  {cons  (CALL)},  tells  the  parser  what  kind  of  node  to  build  whenever 
this  rule  is  used. 
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•  Rule  3  describes  chunks  that,  although  they  could  be  formed  by  rule  2,  are  struc¬ 
tures  that  we  know  we  are  not  interested  in.  In  this  case,  there  is  a  special  kind  of 
error  call  that  we  want  to  ignore. The  action  {reject}  says  that  whenever  this  rule 
matches,  its  result  should  be  ignored. 

•  Rule  4  describes  water,  i.e.,  the  chunks  that  correspond  to  the  parts  of  the  program 
that  aren’t  CALL  statements.  Rule  5  is  used  to  generate  the  water.  But  notice  that 
it  has  the  {avoid}  action  associated  with  it. That  means  that  it  will  not  be  used  to 
match  any  text  that  can  be  matched  by  some  other,  non-avoiding  rule. 

Island  grammars  can  be  exploited  by  appropriately  crafted  parsers.  But  we  should 
note  here,  to  avoid  confusion,  that  there  is  also  a  somewhat  different  notion,  called 
island  parsing,  in  which  the  goal  is  to  use  a  standard  grammar  to  produce  a  complete 
parse  given  an  input  string.  But.  while  conventional  parsers  read  and  analyze  their  in¬ 
puts  left-to-right,  an  island  parser  first  scans  its  input  looking  for  one  or  more  regions 
where  it  seems  likely  that  a  correct  parse  tree  can  be  built.  Then  it  grows  the  parse  tree 
outward  from  those  “islands”  of  (relative)  certainty.  If  the  input  is  ill-formed  (as  is  likely 
to  happen,  for  example,  in  the  case  of  spoken  language  understanding),  then  the  final 
output  of  the  parser  will  be  a  sequence  of  islands,  rather  than  a  complete  parse.  So  is¬ 
land  grammars  and  island  parsing  are  both  techniques  for  coping  with  ill-formed  and 
unpredictable  inputs.  Island  grammars  approach  the  task  by  specifying,  at  grammar¬ 
writing  time,  which  parts  of  the  input  should  be  analyzed  and  which  should  be  ignored. 
Island  parsers,  in  this  other  sense,  approach  the  task  by  using  a  full  grammar  and  decid¬ 
ing,  at  parse  time,  which  input  fragments  appear  to  be  parsable  and  which  don't. 

11.10  Stochastic  Context-Free  Grammars  * 

Recall  that,  at  the  end  of  our  discussion  of  finite  state  machines  in  Chapter  5,  we  intro¬ 
duced  the  idea  of  a  stochastic  FSM:  an  NDFSM  whose  transitions  have  been  augment¬ 
ed  with  probabilities  that  describe  some  phenomenon  that  we  want  to  model.  We  can 
apply  that  same  idea  to  context-free  grammars:  We  can  add  probabilities  to  grammar 
rules  and  so  create  a  stochastic  context-free  grammar  (also  called  a  probabilistic  con¬ 
text-free  grammar)  that  generates  strings  whose  distribution  matches  some  naturally 
occurring  distribution  with  which  we  are  concerned. 


A  stochastic  context-free  grammar  can  be  used  to  generate  random  Eng¬ 
lish  text  that  may  seem  real  enough  to  fool  some  people  19. 

A  stochastic  context-free  grammar  G  is  a  quintuple  ( V,  2,  R,  S,  D),  where: 

•  V  is  the  rule  alphabet,  which  contains  nonterminals  (symbols  that  are  used  in  the 
grammar  but  that  do  not  appear  in  strings  in  the  language)  and  terminals, 

•  2  (the  set  of  terminals)  is  a  subset  of  V, 

•  R  (the  set  of  rules)  is  a  finite  subset  of  (V  -  2)  x  V*, 

•  S  (the  start  symbol)  can  be  any  element  of  V  —  2,  and 
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•  D  is  a  function  from  R  to  [0  -  1],  So  D  assigns  a  probability  to  each  rule  in  R.  D 
must  satisfy  the  requirement  that,  for  every  nonterminal  symbol  ,Y.  the  sum  of  the 
probabilities  associated  with  all  rules  whose  left-hand  side  is  A'  must  be  1. 


EXAMPLE  11.27  A  Simple  Stochastic  Grammar 

Recall  PalEven  =  {ira)R:«;e{a,  b}*},  the  language  of  even-length  palin¬ 
dromes  of  a’s  and  b’s.  Suppose  that  we  want  to  describe  the  specific  case  in  which 
a's  occur  three  times  as  often  as  b's  do.  Then  we  might  write  the  grammar 
G  =  ({S,  a,  b},  {a,  b},  R,  S,  £>),  where  R  and  D  are  defined  as  follows: 

S  —  aSa.  [.72] 

S— *bSb  [.24] 

S-*e  [.04] 


Given  a  grammar  G  and  a  string  s .  the  probability  of  a  particular  parse  tree  t  is  the 
product  of  the  probabilities  associated  with  the  rules  that  were  used  to  generate  it.  In 
other  words,  if  we  let  C  be  the  collection  (in  which  duplicates  count)  of  rules  that  were 
used  to  generate  /  and  we  let  Pr(r)  be  the  probability  associated  with  rule  r.  then: 

Pr(0  =  n*Mr). 

rmC 


Stochastic  context-free  grammars  play  an  important  role  in  natural  language 
processing.  (L.3.6) 


Stochastic  grammars  can  be  used  to  answer  two  important  kinds  of  questions: 

•  In  an  error-free  environment,  we  know  that  we  need  to  analyze  a  particular  string 5.  So 
we  want  to  solve  the  following  problem:  Given  s.  find  the  most  likely  parse  tree  for  it. 

•  In  a  noisy  environment,  we  may  not  be  sure  exactly  what  string  we  need  to  analyze. 
For  example,  suppose  that  it  is  possible  that  there  have  been  spelling  errors,  so  the 
true  string  is  similar  but  not  identical  to  the  one  we  have  observed.  Or  suppose  that 
there  may  have  been  transmission  errors.  Or  suppose  that  we  have  transcribed  a 
spoken  string  and  it  is  possible  that  we  didn’t  hear  it  correctly.  In  all  of  these  cases 
we  want  to  solve  the  following  problem:  Given  a  set  of  possible  true  strings  X  and 
an  observed  string  o,  find  the  particular  string  s  (and  possibly  also  the  most  likely 
parse  for  it)  that  is  most  likely  to  have  been  the  one  that  was  actually  generated. 
Note  that  the  probability  of  generating  any  particular  string  u<  is  the  sum  of  the 
probabilities  of  generating  each  possible  parse  tree  for  iv.  In  other  words,  if  T is  the 
set  of  possible  parse  trees  for  w,  then  the  total  probability  of  generating  w  is: 

Pr(w)  *  (0- 

leT 
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Then  the  sentence  s  that  is  most  likely  to  have  been  generated,  given  the  obser¬ 
vation  o,  is  the  one  with  the  highest  conditional  probability  given  o.  Recall  that 
argmax  of  w  returns  the  value  of  the  argument  w  that  maximizes  the  value  of  the 
function  it  is  given.  So  the  highest  probability  sentence  s  is: 


s  =  argmax  Pr  (tc|o) 


wtX 

=  argmax 

u*X 


Pr  (o|w;)Pr  (to) 

K(p) 


Stochastic  context-free  grammars  can  be  used  model  the  three-dimensional 
structure  of  RNA.  (K.4) 


In  Chapter  15,  we  will  discuss  techniques  for  parsing  context-free  languages  that  are 
defined  by  standard  (i.e.,  without  probabilistic  information)  context-free  grammars. 
Those  techniques  can  be  extended  to  create  techniques  for  parsing  using  stochastic 
grammars.  So  they  can  be  used  to  answer  both  of  the  questions  that  we  just  presented. 


Exercises 

1.  Let  2  =  {a,  b).  For  the  languages  that  are  defined  by  each  of  the  following 
grammars,  do  each  of  the  following: 

i.  List  five  strings  that  are  in  L. 

ii.  List  five  strings  that  are  not  in  L  (or  as  many  as  there  are,  whichever  is 
greater). 

Ui.  Describe  L  concisely.  You  can  use  regular  expressions,  expressions  using 
variables  (e.g.,  a"bw,  or  set  theoretic  expressions  (e.g.,  {x: . . .  }). 
iv.  Indicate  whether  or  not  L  is  regular.  Prove  your  answer. 

a.  5  — *  aS  |  Sb  |  e 

b.  S  -*  aSa  |  bSb  |  a  |  b 

c.  S  —*  a5  |  bS  |  e 

d.  S  — *  a5  |  a5bS  |  e 

2.  Let  G  be  the  grammar  of  Example  11.12.  Show  a  third  parse  tree  that  G  can  pro¬ 
duce  for  the  string  (())(). 

3.  Consider  the  following  grammar  G : 

S  —  0S1|SS|10 

Show  a  parse  tree  produced  by  C  for  each  of  the  following  strings: 

a.  010110. 

b.  00101101. 

4.  Consider  the  following  context  free  grammar  C: 

S  — *  aSa 
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S  — 7 
S  *  K 
7  — b7 
7  — c7 
7— B 

One  of  these  rules  is  redundant  and  could  be  removed  without  altering  7(G). 
Which  one? 

5.  Using  the  simple  English  grammar  that  we  showed  in  Example  I 1.6. show  two 
parse  trees  for  each  of  the  following  sentences.  In  each  case,  indicate  which  parse 
tree  almost  certainly  corresponds  to  the  intended  meaning  of  the  sentence: 

a.  The  bear  shot  Fluffy  with  the  rifle. 

b.  Fluffy  likes  the  girl  with  the  chocolate. 

6.  Show  a  context-free  grammar  for  each  of  the  following  languages  7: 

a.  BalDelim  =  {w  ‘  where  w  is  a  string  of  delimiters:  (.).  |,  |.  {.  },  that  are 
properly  balanced ). 

b.  {a'l y :  2/  =  3/  +  I}. 

c.  {aW :  2/  *  3 j  +  l }. 

d.  {me  {a.  b}* :  #a( *r)  =  2*#b( «')}•}• 

e.  7  =  { we  { a.  b)* :  w  =  wK }. 

f.  { a'b'c* :  i.j,  k  s  (J  and  (i  *  j  or  j  *  k)\. 

g.  { a'b'c* :  i.j,  k  s  0  and  (isior/ts  j) ) . 

h.  { w  e  { a.  b  }* :  every  prefix  of  w  has  at  least  as  many  as  as  b's } . 

i.  { a"b"' :  m  s  n ,  m-n  is  even } . 

j.  { a"'b"c,,d,/ : m. n , p . </  s  0 and m  +n=/>fr/}. 

k.  {.rcw : ire  {a,  b}*  and  (#a(.t)  =  ;/or#b(v)  =  «)}. 

l.  {/?,#/»/,.  R :  b,  is  the  binary  representation  of  some  integer  i  s  0.  without 
leading  zeros).  (For  example  101#011  e  7.) 

m.  {jcR#y : x.ye  {0, 1)*  and*  is  a  substring  of  y). 

7.  Let  G  be  the  ambiguous  expression  grammar  of  Example  11.14.  Show  at  least 
three  different  parse  trees  that  can  be  generated  from  G  for  the  string 
id+id*id*id. 

8.  Consider  the  unambiguous  expression  grammar  (]'  of  Example  1 1.1  <7 

a.  Trace  a  derivation  of  the  string  id  +  id*id*id  in  C 

b.  Add  exponentiation  (**)  and  unary  minus  <  — )  to  (}'.  assigning  the  highest 
precedence  to  unary  minus,  followed  by  exponentiation,  multiplication,  and 
addition,  in  that  order. 

9.  Let  7=  (ice  {a,  b.  U.e,  (,  ).  *.*}*:  w  is  a  syntactically  legal  regular 
expression). 

a.  Write  an  unambiguous  context-free  grammar  that  generates  7.  Your  gram¬ 
mar  should  have  a  structure  similar  to  the  arithmetic  expression  grammar  G‘ 
that  we  presented  in  Example  1 1.19.  It  should  create  parse  trees  that: 
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•  Associate  left  given  operators  of  equal  precedence,  and 

•  Correspond  to  assigning  the  following  precedence  levels  to  the  operators 
(from  highest  to  lowest): 

•  *  and  + 

•  concatenation 

•  U 

b.  Show  the  parse  tree  that  your  grammar  will  produce  for  the  string  (a  U  b)  ba*. 

10.  Let  L  =  {ioe{A  -  Z,  -v,  A,  V,  — *»(»)}*:  w  is  a  syntactically  legal  Boolean 
expression}. 

a.  Write  an  unambiguous  context-free  grammar  that  generates  L  and  that  cre¬ 
ates  parse  trees  that: 

•  Associate  left  given  operators  of  equal  precedence,  and 

•  Correspond  to  assigning  the  following  precedence  levels  to  the  operators 
(from  highest  to  lowest):  -i,  A,  V,  and  — * . 

b.  Show  the  parse  tree  that  your  grammar  will  produce  for  the  string: 

->PvR— *Q— *S 

11.  In  1.3.1.  we  present  a  simplified  grammar  for  URIs  (Uniform  Resource  Identi¬ 
fiers).  the  names  that  we  use  to  refer  to  objects  on  the  Web. 

a.  Using  that  grammar,  show  a  parse  tree  for: 

https : //www . mystuf f . wow/wi dgets/f radgi tfsword 

b.  Write  a  regular  expression  that  is  equivalent  to  the  grammar  that  we  present. 

12.  Prove  that  each  of  the  following  grammars  is  correct: 

a.  The  grammar,  shown  in  Example  11.3,  for  the  language  PalEven. 

b.  'Hie  grammar,  shown  in  Example  1 1.1,  for  the  language  Bal. 

13.  For  each  of  the  following  grammars  G.  show  that  G  is  ambiguous.  Then  find  an 
equivalent  grammar  that  is  not  ambiguous. 

a.  ({5.  A,  B ,  7,  a.c),  {a.c},  R,S),  where  R  =  {S-*AB,  S-*  BA,  A  -*  aA, 
A-*  ac,  B  -*  7c,  7— >  a7, 7— *■  a}. 

b.  ({S,  a,  b},  {a,  b},  R,  S),  where  R  =  {S— *e,S— ►  aSa.  S— *  bSb,  S— *■  aSb, 
5—  bSa.S-»SS}. 

c.  {{S.  A,  fl,  7,  a,  c}.  (a,  c},  R.  S),  where  R  =  {S-*AB,  A  — » AA,  A  — ►  a, 
B->7c,7-*  a7,  7  — *  a}. 

d.  ({5,  a.b},  {a. b},  R,S),  where  R  =  {S-+  aSb.S  —  bSa.S— ►SS.S— *e).  (G 
is  the  grammar  that  we  presented  in  Example  11.10  for  the  language 
L  =  {toe  {a.b  }*:#,(  w)  =  #b(  «»)}•) 

c.  ({S,  a.b},  {a.b},  R. 5). where  R  =  {5-*  aSb.S-*  aaSb,S—*e}. 

14.  Let  G  be  any  context-free  grammar.  Show  that  the  number  of  strings  that  have  a 
derivation  in  G  of  length  n  or  less,  for  any  n  >  0,  is  finite. 

15.  Consider  the  fragment  of  a  Java  grammar  that  is  presented  in  Example  11.20. 
How  could  it  be  changed  to  force  each  el  se  clause  to  be  attached  to  the  outer¬ 
most  possible  i  f  statement? 
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16.  How  docs  the  COND  form  in  Lisp,  as  described  in  G.5.  avoid  the  dangling  else 
problem? 

17.  Consider  the  grammar  C  of  Example  11.19. 

a.  Convert  G'  to  Chomsky  normal  form. 

b.  Consider  the  string  i  d*i  d+i d. 

i.  Show  the  parse  tree  that  C  produces  for  it. 

ii.  Show  the  parse  tree  that  your  Chomsky  normal  form  grammar  pro¬ 
duces  for  it. 

18.  Convert  each  of  the  following  grammars  to  Chomsky  normal  form: 

a.  S  — » a 5a 
S  —*  B 
B-*bbC 
5  —  bb 
C  — »  £ 

C—*cC 

b.  S  —*  ABC 
A-*  aC  |  D 
B-*  bfi  1  e  |  A 
C  —*  Ac  \  b  \  Cc 
D— *  aa 

c.  S  — *  a7Va 
f-*a7a|  b7b|e|  V 
V-*cVc  |  e 


CHAPTER  12 


Pushdown  Automata 


Grammars  define  context-free  languages.  We'd  also  like  a  computational  formal¬ 
ism  that  is  powerful  enough  to  enable  us  to  build  an  acceptor  for  every  con¬ 
text-free  language.  In  this  chapter,  we  describe  such  a  formalism. 


12.1  Definition  of  a  (Nondeterministic)  PDA 

A  pushdown  auiomaton.  or  PDA,  is  a  finite  state  machine  that  has  been  augmented  by 
a  single  stack.  In  a  minute,  we  will  present  the  formal  definition  of  the  PDA  model  that 
we  will  use.  But.  before  we  do  that,  one  caveat  to  readers  of  other  books  is  in  order. 
There  are  several  competing  PDA  definitions,  from  which  we  have  chosen  one  to  pres¬ 
ent  here.  All  are  provably  equivalent,  in  the  sense  that,  for  all  i  and  ;,  if  there  exists  a 
version,  PDA  that  accepts  some  language  L  then  there  also  exists  a  version,  PDA  that 
accepts  L.  We’ll  return  to  this  issue  in  Section  12.5,  where  wc  will  mention  a  few  of  the 
other  models  and  sketch  an  equivalence  proof.  For  now.  simply  beware  of  the  fact  that 
other  definitions  are  also  in  widespread  use. 

We  will  use  the  following  definition:  A  pushdown  automaton  (or  PDA)  M  is  a  sex¬ 
tuple  (AT.  2,  r,  A, 5,  A),  where: 

•  K  is  a  finite  set  of  states, 

•  2  is  the  input  alphabet, 

•  T  is  the  stack  alphabet, 

•  s  e  K  is  the  start  state, 

•  AC.Kh  the  set  of  accepting  states,  and 

•  A  is  the  transition  relation.  It  is  a  finite  subset  of: 

(k  x  p:u|B})  x  r*  )  x  (k  x  r*  ). 

state  input  ore  string  of  symbols  state  string  of  symbols 

to  pop  from  to  push  on  top 

top  of  stack 
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A  configuration  of  a  PDA  M  is  an  element  of  K  x  1*  X  ['*.  It  captures  the  three 
things  that  can  make  a  difference  to  M’s  future  behavior: 


•  its  current  state, 

•  the  input  that  is  still  left  to  read,  and 

•  the  contents  of  its  stack. 


The  initial  configuration  of  a  PDA  /W.on  input  w.  is  [s.  w.  c). 

We  will  use  the  following  notational  convention  for  describing  Afs  slack  as  a  string: 
The  top  of  the  stack  is  to  the  left  of  the  string.  So: 


c 


a 


will  be  written  as 


b 


cab 


If  a  sequence  cto> . . .  c„  of  characters  is  pushed  onto  the  stack,  they  will  be  pushed 
rightmost  first,  so  if  the  value  of  the  stack  before  the  push  was  s.  the  value  after  the 
push  will  be  C|C2 . . .  c„s. 

Analogously  to  what  we  did  for  FSMs,  we  define  the  relation  yields-in-one-step , 
written  1-^-  Yields-in-one-step  relates  configuration j  to  configuration ;  iff  M  can  move 
from  configuration \  to  configuration 2  in  one  step.  Let  c  be  any  element  of  5)  U  {e},  let 
y,,  y2  and  y  be  any  elements  of  T*,  and  let  w  be  any  element  of  2*.  Then: 


(<7i«  cw,  yxy)\-M  (<72.  Yiy)  iff  ((<7i.  c,  y,),  (</:.  y2))  e  A. 

Note  two  things  about  what  a  transition  ((</j.  c ,  yi),  (q2.  y2))  says  about  how  M  ma¬ 
nipulates  its  stack: 

•  M  may  only  take  the  transition  if  the  string  yj  matches  the  current  lop  of  the  stack. 
If  it  does,  and  the  transition  is  taken,  then  M  pops  y,  and  then  pushes  y2.  M  cannot 
“peek”  at  the  top  of  its  stack  without  popping  off  the  values  that  it  examines. 

•  If  yi  =  e,  then  M  must  match  e  against  the  top  of  the  stack.  But  s  matches  every¬ 
where.  So  letting  yx  be  e  is  equivalent  to  saying  “without  bothering  to  check  the 
current  value  of  the  stack.”  It  is  not  equivalent  to  saying,  “if  the  stack  is  empty.”  In 
our  definition,  there  is  no  way  to  say  that  directly,  although  we  will  see  that  we  can 
create  a  way  by  letting  M.  before  it  does  anything  else,  push  a  special  marker  onto 
the  stack.  Then,  whenever  that  marker  is  on  the  top  of  the  slack,  the  stack  is  other¬ 
wise  empty. 

The  relation  yields ,  written  |-M*,  is  the  reflexive,  transitive  closure  of  |-M.  So  config¬ 
uration  C|  yields  configuration  C2  iff: 


C]\-M*  Ci. 
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A  computation  by  M  is  a  finite  sequence  of  configurations  C(>,  Q, . . . ,  C„  for  some 
n  s  0  such  that: 

•  C0  is  an  initial  configuration, 

•  C„  is  of  the  form  ( q ,  e,  7),  for  some  state  q  e  K  and  some  string  7  in  T*,  and 

•  CdI-m  QI'M  C„. 

Note  that  we  have  defined  the  behavior  of  a  PDA  M  by  a  transition  relation  A,  not 
a  transition  function.  Thus  we  allow  nondeterminism.  If  M  is  in  some  configuration 
(q\,s,  7),  it  is  possible  that: 

•  A  contains  exactly  one  transition  that  matches.  In  that  case,  M  makes  the  specified 
move. 

•  A  contains  more  than  one  transition  that  matches.  In  that  case,  M  chooses  one  of 
them.  Each  choice  defines  one  computation  that  M  may  perform. 

•  A  contains  no  transition  that  matches.  In  that  case,  the  computation  that  led  to  that 
configuration  halts. 

Let  C  be  a  computation  of  M  on  input  w  e  2*. Then  we  will  say  that: 

•  C  is  an  accepting  computation  iff  C  =  (s,  w,  e)|-w*  (q,  e,  e),  for  some  qeA.  Note 
the  strength  of  this  requirement:  A  computation  accepts  only  if  it  runs  out  of  input 
when  it  is  in  an  accepting  state  and  the  stack  is  empty. 

•  C  is  a  rejecting  computation  iff  C  =  (5,  w ,  e)|-M*  (q,  w\  a),  where  C  is  not  an  ac¬ 
cepting  computation  and  where  M  has  no  moves  that  it  can  make  from  ( q ,  w\  a).  A 
computation  can  reject  only  if  the  criteria  for  accepting  have  not  been  met  and 
there  are  no  further  moves  (including  following  e-transitions)  that  can  be  taken. 

Let  w  be  a  string  that  is  an  element  of  2*.  Then  we  will  say  that: 

•  M  accepts  w  iff  at  least  one  of  its  computations  accepts. 

•  M  rejects  iv  iff  all  of  its  computations  reject. 

The  language  accepted  by  M ,  denoted  L(M),  is  the  set  of  all  strings  accepted  by  M. 
Note  that  it  is  possible  that,  on  input  w,  M  neither  accepts  nor  rejects. 

In  all  the  examples  that  follow,  we  will  draw  a  transition  ((<?),  c,  7,),  (q2,  72))  as  an 
arc  from  q  1  to  q2,  labeled  cly\ly2.  So  such  a  transition  should  be  read  to  say,  “If  c 
matches  the  input  and  71  matches  the  top  of  the  stack,  the  transition  from  q\  to  q2  can 
be  taken,  in  which  case  c  should  be  removed  from  the  input,  7j  should  be  popped  from 
the  stack,  and  y2  should  be  pushed  onto  it.”  If  c  =  e,  then  the  transition  can  be  taken 
without  consuming  any  input.  If  71  =  e.  the  transition  can  be  taken  without  checking 
the  stack  or  popping  anything.  If  y2  =  e,  nothing  is  pushed  onto  the  stack  when  the 
transition  is  taken.  As  we  did  with  FSMs,we  will  use  a  double  circle  to  indicate  accept¬ 
ing  states. 

Even  very  simple  PDAs  may  be  able  to  accept  languages  that  cannot  be  accepted  by 
any  FSM.The  power  of  such  machines  comes  from  the  ability  of  the  stack  to  count. 
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EXAMPLE  12.1  The  Balanced  Parentheses  Language 

Consider  again  Bal  =  {tue  {),(}*  :  the  parentheses  are  balanced}.  The  follow¬ 
ing  one-state  PDA  M  accepts  Bal.  M  uses  its  slack  to  count  the  number  of  left 
parentheses  that  have  not  yet  been  matched.  We  show  M  graphically  and  then  as 
a  sextuple: 


M  =  (K,  2,  T,  A,  s,  A),  where: 

(the  states) 

2  =  {(,)}. 

(the  input  alphabet) 

r  =  {(}, 

(the  slack  alphabet) 

A  =  {5}, and 

(the  accepting  state) 

A  =  {(M,e),(s.  ()), 

((s,  M).(s.e))}. 

If  M  sees  a  (,  it  pushes  it  onto  the  stack  (regardless  of  what  was  already  there). 
If  it  sees  a  )  and  there  is  a  ( that  can  be  popped  off  the  stack.  M  does  so.  If  it  sees 
a  )  and  there  is  no  ( to  pop,  M  halts  without  accepting.  If.  after  consuming  its  en¬ 
tire  input  string,  M's  stack  is  empty,  M  accepts.  If  the  stack  is  not  empty,  M  rejects. 


PDAs,  like  FSMs,  can  use  their  states  to  remember  facts  about  the  structure  of  the 
string  that  has  been  read  so  far.  We  see  this  in  the  next  example. 


EXAMPLE  12.2  AnBn 

Consider  again  AnBn  =  {a'V  :n  &  0}.  The  following  PDA  M  accepts  AnB".  M 
uses  its  states  to  guarantee  that  it  only  accepts  strings  that  belong  to  a*b*.  It  uses 
its  stack  to  count  a’s  so  that  it  can  compare  them  to  the  b’s.  We  show  M  graphically: 


a/e/a 


b/a/e 


b/a/c 
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Writing  it  out,  we  have  M  =  (K,  2,  T,  A,  s ,  A),  where: 


K  =  {5,/}, 

2  =  {a.b)}, 
r  =  {a}, 

A  -  {s,/},  and 
A  =  {((s,  a.e),  (s.  a)), 
((s,  b,  a),  (f,  e)), 


(the  states) 

(the  input  alphabet) 
(the  stack  alphabet) 
(the  accepting  states) 


((/,  b,a),  (/,  e))}. 

Remember  that  M  only  accepts  if,  when  it  has  consumed  its  entire  input  string, 
it  is  in  an  accepting  state  and  its  stack  is  empty.  So,  for  example,  M  will  reject  aaa, 
even  though  it  will  be  in  state  s,  an  accepting  state,  when  it  runs  out  of  input.  The 
stack  at  that  point  will  contain  aaa. 


EXAMPLE  12.3  WcW* 

LetWcWR  =  {wcwR  •  w  e  {a,  b}*}.  The  following  PDA  M  accepts  WcWR: 


M  moves  from  state  s ,  in  which  it  is  recording  w,  to  state  /,  in  which  it  is  check¬ 
ing  for  wR,  when  it  sees  the  character  c.  Since  every  string  in  WcWR  must  contain 
the  middle  c,  state  s  is  not  an  accepting  state. 


The  definition  that  we  have  chosen  to  use  for  a  PDA  is  flexible;  it  allows  several 
symbols  to  be  pushed  or  popped  from  the  stack  in  one  move.  This  will  turn  out  to  be 
particularly  useful  when  we  attempt  to  build  PDAs  that  correspond  to  practical  gram¬ 
mars  that  contain  rules  like  T-*T*F  (the  multiplication  rule  that  was  part  of  the 
arithmetic  expression  grammar  that  we  defined  in  Example  11.19).  But  we  illustrate 
the  use  of  this  flexibility  here  on  a  simple  case. 


EXAMPLE  12.4  AnB2n 

Let  AnB_n  =  { a'^2" :  n  a  0} .  The  following  PDA  M  accepts  AnB2n  by  pushing  two 
a’s  onto  the  stack  for  every  a  in  the  input  string. Then  each  b  nons  a  «inoi»»  a* 
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EXAMPLE  12.4  ( Continued ) 


12.2  Deterministic  and  Nondeterministic  PDAs 

The  definition  of  a  PDA  that  we  have  presented  allows  nondeterminism.  It  sometimes 
makes  sense,  however,  to  restrict  our  attention  to  deterministic  PDAs.  In  this  section 
we  will  define  what  we  mean  by  a  deterministic  PDA.  We  also  show  some  examples  of 
the  power  of  nondeterminism  in  PDAs.  Unfortunately,  in  contrast  to  the  situation  with 
FSMs.  and  as  we  will  prove  in  Theorem  13.13,  there  exist  nondeterministic  PDAs  for 
which  no  equivalent  deterministic  PDA  exists. 

12.2.1  Definition  of  a  Deterministic  PDA 

Define  a  PDA  M  to  be  deterministic  iff  there  exists  no  configuration  of  M  in  which 
M  has  a  choice  of  what  to  do  next.  For  this  to  be  true,  two  conditions  must  hold: 

1.  A,w  contains  no  pairs  of  transitions  that  compete  with  each  other. 

2.  If  q  is  an  accepting  state  of  Af,  then  there  is  no  transition  ((</.  e.  e).  ( p ,  a))  for 
any  p  or  a.  In  other  words.  M  is  never  forced  to  choose  between  accepting  and 
continuing.  Any  transitions  out  of  an  accepting  state  must  either  consume  input 
(since,  if  there  is  remaining  input,  M  does  not  have  the  option  of  accepting)  or 
pop  something  from  the  stack  (since,  if  the  stack  is  not  empty.  M  does  not  have 
the  option  of  accepting). 

So  far.  all  of  the  PDAs  that  we  have  built  have  been  deterministic.  So  each  machine 
followed  only  a  single  computational  path. 

12.2.2  Exploiting  Nondeterminism 

But  a  PDA  may  be  designed  to  have  multiple  competing  moves  from  a  single  configu¬ 
ration.  As  with  FSMs,  the  easiest  way  to  envision  the  operation  of  a  nondeterministic 
PDA  M  is  as  a  tree,  as  shown  in  Figure  12.1.  Each  node  in  the  tree  corresponds  to  a 
configuration  of  M  and  each  path  from  the  root  to  a  leaf  node  may  correspond  to  one 
computation  that  M  might  perform. 

Notice  that  the  state,  the  stack,  and  the  remaining  input  can  be  different  along  dif¬ 
ferent  paths.  As  a  result,  it  will  not  be  possible  to  simulate  all  paths  in  parallel,  the  way 
we  did  for  NDFSMs. 
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4i.  abab.  e 


<7i.  ab.  ab#  g3,  ab,  a# 

FIGURE  12.1  Viewing  nondeterminism  as  search  through  a  space  of  computation 
paths. 


EXAMPLE  12.5  Even  Length  Palindromes 

Consider  again  PalEven  =  {muR  :  w e  {a,  b}*},  the  language  of  even-length 
palindromes  of  a’s  and  b’s.  The  following  nondeterministic  PDA  M  accepts 
PalEven: 


M  is  nondeterministic  because  it  cannot  know  when  it  has  reached  the  middle 
of  its  input.  Before  each  character  is  read,  it  has  two  choices:  It  can  guess  that  it 
has  not  yet  gotten  to  the  middle.  In  that  case,  it  stays  in  state  s,  where  it  pushes 
each  symbol  it  reads.  Or  it  can  guess  that  it  has  reached  the  middle.  In  that  case,  it 
takes  the  e-transition  to  state  /,  where  it  pops  one  symbol  for  each  symbol  that  it 
reads. 


EXAMPLE  12.6  Equal  Numbers  of  a's  and  b's 

Let  L  =  {we  {a,  b}*  :  #a(u>)  =  #b(w)}.  Now  we  don’t  know  the  order  in  which 
the  a’s  and  b’s  will  occur. They  can  be  interleaved.  So  for  example,  any  PDA  to  ac¬ 
cept  L  must  accept  aabbba.The  only  way  to  count  the  number  of  characters  that 
have  not  yet  found  their  mates  is  to  use  the  stack.  So  the  stack  will  sometimes 
count  a’s  and  sometimes  count  b’s.  It  will  count  whatever  it  has  seen  more  of.  The 
following  simple  PDA  accepts  L : 
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EXAMPLE  12.6  ( Continued ) 


This  machine  is  highly  nondeierministic.  Whenever  it  sees  an  a  in  the  input,  it 
can  either  push  it  (which  is  the  right  thing  to  do  if  it  should  be  counting  a's)  or  at¬ 
tempt  to  pop  a  b  (which  is  the  right  thing  to  do  if  it  should  be  counting  b’s).  All  the 
computations  that  make  the  wrong  guess  will  fail  to  accept  since  they  will  not  suc¬ 
ceed  in  clearing  the  stack.  But  if  #a(w)  =  #b {ic).  there  will  be  one  computation 
that  will  accept. 


EXAMPLE  12.7  The  a  Region  and  the  b  Region  are  Different 

Let  L-  {a"'b''  :m  *■  n;  m,  n  >  0}.  We  want  to  build  a  PDA  M  to  accept  L.  It  is 
hard  to  build  a  machine  that  looks  for  something  negative,  like  #.  But  we  can 
break  L  into  two  sublanguages:  {a"‘b":  0  <  m  <  n }  and  { u"'b" :  0  <  n  <  m}. 
Either  there  are  more  a’s  or  more  b’s.  M  must  accept  any  string  that  is  in  either  of 
those  sublanguages.  So  M  is: 


As  long  as  M  sees  a’s,  it  stays  in  state  1  and  pushes  each  a  onto  the  stack.  When 
it  sees  the  first  b,  it  goes  to  state  2.  It  will  accept  nothing  but  b’s  from  that  point  on. 
So  far,  its  behavior  has  been  deterministic.  But.  from  state  2,  it  must  make  choices. 
Each  time  it  sees  another  b  and  there  is  an  a  on  the  slack,  it  should  consume  the  b 
and  pop  the  a  and  stay  in  stale  2.  But,  in  order  to  accept,  it  must  eventually  either 
read  at  least  one  b  that  does  not  have  a  matching  a  or  pop  an  a  that  docs  not  have 
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a  matching  b.  It  should  do  the  former  (and  go  to  state  4)  if  there  is  a  b  in  the  input 
stream  when  the  stack  is  empty.  But  we  have  no  way  to  specify  that  a  move  can  be 
taken  only  if  the  stack  is  empty.  It  should  do  the  latter  (and  go  to  state  3)  if  there  is 
an  a  on  the  stack  but  the  input  stream  is  empty.  But  we  have  no  way  to  specify  that 
the  input  stream  is  empty. 

As  a  result,  in  most  of  its  moves  in  state  2,  M  will  have  a  choice  of  three  paths 
to  take.  All  but  the  correct  one  will  die  out  without  accepting.  But  a  good  deal  of 
computational  effort  will  be  wasted  first. 


In  the  next  section,  we  present  techniques  for  reducing  nondeterminism  caused  by 
the  two  problems  we've  just  presented: 

•  A  transition  that  should  be  taken  only  if  the  stack  is  empty,  and 

•  A  transition  that  should  be  taken  only  if  the  input  stream  is  empty. 

But  first  we  present  one  additional  example  of  the  power  of  nondeterminism. 


EXAMPLE  12.8  ^AnBnC" 

Let’s  first  consider  AnBnCn  =  {a',b',cn  :  n  ^  0}.  If  we  try  to  think  about  building 
a  PDA  to  accept  AnBnCn,  we  immediately  run  into  trouble.  We  can  use  the  stack 
to  count  a’s  and  then  compare  them  to  the  b’s  But  then  the  stack  will  be  empty 
and  il  won’t  be  possible  to  compare  the  c’s  We  can  try  to  think  of  something 
clever  to  get  around  this  problem,  but  we  will  fail.  We’ll  prove  in  Chapter  13  that 
no  PDA  exists  to  accept  this  language. 

But  now  let  L  =  -»AnBnCn. There  is  a  PDA  that  accepts  L.L  =  L\  U  L2,  where: 

•  L\  =  {we  {a,  b,c}*  :  the  letters  are  out  of  order}. 

•  Lz  =  { aVc*  :  /,/,  k  ^  0  and  (i  j  or  j  *  k) }  (in  other  words,  not  equal  num¬ 
bers  of  a’s,  b’s,  and  c’s). 

A  simple  FSM  can  accept  L\.  So  we  focus  on  L2.  It  turns  out  to  be  easier  to 
check  for  a  mismatch  in  the  number  of  a’s,  b’s,  and  c’s  than  to  check  for  a  match 
because,  to  delect  a  mismatch,  il  is  sufficient  to  find  one  thing  wrong.  It  is  not  nec¬ 
essary  to  compare  everything.  So  a  string  w  is  in  L2  iff  either  (or  both)  the  a  s  and 
b’s  don’t  match  or  the  b’s  and  c’s  don’t  match.  We  can  build  PDAs,  such  as  the  one 
we  built  in  Example  12.7,  to  check  each  of  those  conditions  So  we  can  build  a 
straightforward  PDA  for  L.  Il  first  guesses  which  condition  to  check  for.  Then 
submachincs  do  the  checking.  We  sketch  a  PDA  for  L  here  and  leave  the  details 
as  an  exercise: 
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EXAMPLE  12.8  ( Continued ) 


Thiv  la\t  example  »v  vignifivant  !«*•  two  ru*«m 

•  ll  lUuvlralcx  the  pviwcl  of  notuktcimumm 

•  ll  prove*  that  the  claw  of  language*  *tvrp«aM*  In  l'l)\t  n  rv.a  utvkcf  ev 

plemenl  V\c  ll  have  mole  li»  vav  about  lhal  in  V«U«  1 1  4 

An  important  (act  about  the  context  lice  language*  in  cmtiMl  to  the  regular 
that  noniietcrminixm  tt  more  than  a  convenient  ik  ugn  u*4  In  Vtiwwi  1 1  4  ( 

the  drirrminitlic  cimtext  frrr  lattguagrt  to  he  lh.«t*  that  tan  hr  attract!  In  uwne  *lci 
mum  in.  PDA  that  mav  exploit  an  cikI  of  tiling  mallei  Ihen  we  «*il  pint  that  thex^- 
context-free  language*  that  are  not  ikleinurmlic  m  (hit  venue  I  hut  rherc  c\nt\ 
context-free  language*.  no  equivalent  4i|  the  tegular  language  algooihm 
There  are,  however,  tome  technique*  that  tan  he  uu-J  to  rrtlutr  m wwkict minrvm  tn  m 
of  the  kind*  of  cavcv  lhal  often  imui  We  ll  vkcUh  two  <4  thrm  in  the  nevl  vevtnm 


12.2.3  Techniques  for  Reducing  Nondeterminism 


In  i:.\amplc  12.7.  wc  vaw  nomlctcrntinivm  anting  ftvMn  two  %et>  ipol*  ouumvt 


•  A  tranvition  that  vhoultl  he  taken  onlv  if  the  ital  i%  cmpi  um|vicv  again*!  i-r 
more  move*  that  require  a  malth  of  tome  tiring  on  the  viauk  arvj 

•  A  tranvition  that  vhoultl  he  taken  onl>  rl  the  input  vlteam  it  rmpt\  C\*niv 

again*!  one  or  more  move*  lhal  require  a  malth  agamvt  a  vpettfw  input  . 

lioth  of  thevc  circumvlanecv  arc  common  v>  me  *iuld  lilr  to  find  a  wav  Uv 
or  eliminate  the  nomlclerminivni  that  thev  tauve  -va*^ 


12  2  Df i(«fninittK  «nd  Nondtt«f minnlK  POAi  259 


We  Iir\i  consider  Ihe  case  in  which  the  nondctcrminism  could  he  eliminated  if  it 
were  possible  to  check  for  an  empty  stack.  Although  our  PDA  model  does  not  prostdc 
a  way  to  do  that  directly,  it  is  easy  to  simulate.  Any  PDA  A#  that  would  like  to  he  able 
to  check  for  empty  stack  can  simply,  before  it  does  anything  else,  push  a  special  char¬ 
acter  onto  the  stack.  The  stack  is  then  k>gically  empty  iff  that  special  character  is  at  the 
lop  of  the  stack  lltc  only  thing  we  must  be  careful  about  is  that,  before  M  can  accept 
a  string,  its  stack  must  be  completely  empty.  So  the  special  character  must  be  popped 
whencser  M  reaches  an  accepting  state. 


EXAMPLE  12.9  Using  a  Bottom  of  Stack  Marker 

W'e  can  use  the  special,  hottom-of-siack  marker  technique  to  reduce  the  nondc- 
let  miimm  in  the  PDA  that  we  showed  in  lixamplc  12.7.  We’ll  use  •  as  the  marker. 
When  we  do  that,  we  get  the  following  PDA  A/’: 


Now  the  transition  back  to  stale  2  no  longer  competes  with  the  transition  to 
stale  4.  which  can  only  be  taken  when  the  •  is  the  only  symbol  left  on  the  stack.  AT 
ts  still  nondeterministic  though,  because  the  transition  back  to  slate  2  competes 
with  tlte  transition  to  stale  3.  We  still  don’t  have  a  way  to  specify  that  A#‘  should  go 
to  state  3  only  if  it  has  ran  out  of  input. 


Next  we  consider  Ihe  “out  of  input"  pioblem  To  solve  that  one.  we  will  make  a 
change  to  the  input  language.  Instead  of  building  a  machine  to  accept  a  language  L, 
we  ll  build  one  to  accept  I. S.  where  $  is  a  special  cnd-of-string  marker.  In  any  practical 
system,  wc  would  probably  choose  < nm/mr>  or  <cr>  or  <fn/rr>,  rather  than  $.  but 
we  ll  use  S  here  because  it  is  easy  to  sec. 


EXAMPLE  12.10  Using  an  End-of-String  Marker 

We  can  use  the  end-of  siring  marker  technique  to  eliminate  the  remaining  nondc- 
lor  minion  in  the  PDAs  that  we  showed  in  l-.xample  12.7  and  lixamplc  12.9.  When 
*e  do  that,  we  get  the  following  PDA  A/*: 
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EXAMPLE  12.10  ( Continued ) 


Now  the  transition  back  to  stale  2  no  longer  competes  with  the  transition  to 
stale  3,  since  the  latter  can  only  be  taken  when  the  $  is  read.  Notice  that  we  must 
be  careful  to  read  the  $  on  all  paths,  not  just  the  one  where  we  needed  it. 


Adding  an  end-of-string  marker  to  the  language  to  be  accepted  is  a  powerful  tool  for 
reducing  nondeterminism.  In  Section  13.5.  we’ll  define  the  class  of  deterministic  con¬ 
text-free  languages  to  be  exactly  the  set  of  context-free  languages  L  such  that  IS  can  be 
accepted  by  some  deterministic  PDA.  We’ll  do  that  because,  for  practical  reasons,  we 
would  like  the  class  of  deterministic  context-free  languages  to  be  as  large  as  possible. 


12.3  Equivalence  of  Context-Free  Grammars  and  PDAs 

So  far.  we  have  shown  PDAs  to  accept  several  of  the  context-free  languages  for  which 
we  wrote  grammars  in  Chapter  11.  This  is  no  accident.  In  this  section  we’ll  prove,  as 
usual  by  construction,  that  context-free  grammars  and  pushdown  automata  describe 
exactly  the  same  class  of  languages. 


12.3.1  Building  a  PDA  from  a  Grammar 

THEOREM  12.1  For  Every  CFG  There  Exists  an  Equivalent  PDA 

Theorem:  Given  a  context-free  grammar  G  ~  (V.  K.S).  there  exists  a  PDA  Af 

such  that  L  (M)  -  L  (O’). 

Proof:  The  proof  is  by  construction.  There  are  two  equally  straightforward  ways  to 
do  this  construction,  so  we  will  describe  both  of  them.  Either  of  them  can  be  con¬ 
verted  to  a  practical  parser  (a  recognizer  that  returns  a  parse  tree  if  it  accepts)  by 
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adding  simple  tree-building  operations  associated  with  each  slack  operation. 
We'll  see  how  in  Chapter  15. 

Top-down  parsing:  A  top-down  parser  answers  the  question,  “Could  G  gen¬ 
erate  u'?”  by  starting  with  S ,  applying  the  rules  of  R.  and  seeing  whether  w  can 
be  derived.  We  can  build  a  PDA  that  does  exactly  that.  We  will  define  the  algo¬ 
rithm  cfgtoPDAlopdnwn(G ),  which,  from  a  grammar  G.  builds  a  corresponding 
PDA  M  that,  on  input  w ,  simulates  G  attempting  to  produce  a  leftmost  deriva¬ 
tion  of  w.  M  will  have  two  stales.  The  only  purpose  of  the  first  slate  is  to  push  S 
onto  the  stack  and  then  go  to  the  second  stale.  M’s  stack  will  actually  do  all  the 
work  by  keeping  track  of  what  G  is  trying  derive.  Initially,  of  course,  that  is  S, 
which  is  why  M  begins  by  pushing  S  onto  the  slack.  But  suppose  that  R  contains 
a  rule  of  the  form  5  — »  yiy2 . . .  y„.  Then  M  can  replace  its  goal  of  generating  an  S 
by  the  goal  of  generating  a  y(,  followed  by  a  y2,  and  so  forth.  So  M  can  pop  S  off 
the  stack  and  replace  it  by  the  sequence  of  symbols  y(y2 . ..  y„  (with  yi  on  lop). 
As  long  as  the  symbol  on  the  top  of  the  stack  is  a  nonterminal  in  G,  this  process 
continues,  effectively  applying  the  rules  of  G  to  the  top  of  the  stack  (thus  pro¬ 
ducing  a  left-most  derivation). 

The  appearance  of  a  terminal  symbol  c  on  the  top  of  the  stack  means  that  G  is 
attempting  to  generate  c.  M  only  wants  to  pursue  paths  that  generate  its  input 
string  w.  So.  at  that  point,  it  pops  the  top  symbol  off  the  stack,  reads  its  next  input 
character,  and  compares  the  two.  If  they  match,  the  derivation  that  M  is  pursuing 
is  consistent  with  generating  w  and  the  process  continues.  If  they  don't  match,  the 
path  that  M  is  currently  following  ends  without  accepting.  So,  at  each  step.M  ei¬ 
ther  applies  a  grammar  rule,  without  consuming  any  input,  or  it  reads  an  input 
character  and  pops  one  terminal  symbol  off  the  stack. 

When  M  has  finished  generating  each  of  the  constituents  of  the  S  it  pushed 
initially,  its  stack  will  become  empty.  If  that  happens  at  the  same  time  that  M 
has  read  all  the  characters  of  n\  G  can  generate  k>,  so  M  accepts.  It  will  do  so 
since  its  second  state  will  be  an  accepting  state.  Parsers  with  a  structure  like  M's 
are  called  top-down  parsers.  We'll  have  more  to  say  about  them  in  Section  15.2. 

As  an  example,  suppose  that  R  contains  the  rules  A  -»  a,  B  -*•  b  and 
5— *  AAB.  Assume  that  the  input  to  M  is  aab.  Then  M  first  shifts  S  onto  the 
stack.  Next  it  applies  its  third  rule, pops  S  off, and  replaces  it  by  AAB. Then  it  ap¬ 
plies  its  first  rule,  pops  off  A,  and  replaces  it  by  a.  The  stack  is  then  a AB.  At  that 
point,  it  reads  the  first  character  of  its  input,  pops  a,  compares  the  two  charac¬ 
ters,  sees  that  they  match,  and  continues. The  stack  is  then  AB.  Again  M  applies 
its  first  rule,  pops  off  A.  and  replaces  it  by  a.  The  stack  then  is  aB.Then  it  reads 
the  next  character  of  its  input,  pops  a,  compares  the  two  characters,  sees  that 
they  match,  and  continues.  The  stack  is  then  B.  M  applies  its  second  rule,  pops 
off  B.  and  replaces  it  by  b.  It  reads  the  last  input  character,  pops  off  b,  compares 
the  two  characters,  and  sees  that  they  match.  At  that  point,  M  is  in  an  accepting 
state  and  both  the  stack  and  the  input  stream  are  empty,  so  M  accepts.  The  out¬ 
line  of  M  is  shown  in  Figure  12.2. 
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all  hut  the  first  of  the  transitions  described  below 


FIGURE  12.2  A  PDA  that  parses  top-down. 

Formally.  M  =  ({p,  q),  2,  V,  A./j.  \q\).  where  A  contains: 

•  The  start-up  transition  (( p ,  e.  e),  {q,  A)),  which  pushes  the  start  symbol  onto 
the  slack  and  goes  to  state  q. 

•  For  each  rule  A’-*71y2...‘y„.  in  R.  the  transition  ((</,«,  A’),  (q,  yiy:->7n))’ 
which  replaces  X  by  yi y2. •  y„-  lf«  =  0  (i.e.t  the  right-hand  side  of  the  rule  is 
e),  then  the  transition  is  ((</,  e.  X),  (q.  e )). 

•  For  each  character  c  e  2.  the  transition  ((r/.  c,  i ),  (q, «)).  which  compares  an 
expected  character  from  the  stack  against  the  next  input  character  and  con¬ 
tinues  if  they  match. 

So  we  can  define: 

cfgtoPDAtopdown  (G:  CFG)  = 

From  G,  construct  M  as  defined  above. 

Bottom-up  parsing:  A  bottom-up  parser  answers  the  question.  "Could  G  gen¬ 
erate  u?r  by  starting  with  w.  applying  the  rules  of  R  backwards,  and  seeing 
whether  S  can  be  reached.  We  can  build  a  PDA  thai  does  exactly  that.  We  will  de¬ 
fine  the  algorithm  cfgtoPDAbottonnipi  G).  w  hich,  from  a  grammar  G.  builds  a  cor¬ 
responding  PDA  M  that,  on  input  w,  simulates  the  construction,  backwards,  of  a 
rightmost  derivation  of  w  in  G.  Again.  M  will  have  two  states,  but  this  time  all  the 
work  will  happen  in  the  first  one.  In  the  top-down  approach  that  we  described 
above,  the  entries  in  the  stack  corresponded  to  expectations:  to  constituents  that 
G  was  trying  to  derive.  In  the  bottom-up  approach  that  we  are  describing  now,  the 
objects  in  the  stack  will  correspond  to  constituents  that  have  actually  been  found 
in  the  input.  If  M  ever  finds  a  complete  S  that  covers  its  entire  input,  then  it  should 
accept.  So  if,  when  M  runs  out  of  input,  the  stack  contains  a  single  S.  it  will  accept. 
M  will  be  able  to  perform  two  kinds  of  actions: 

•  M  can  read  an  input  symbol  and  shift  it  onto  the  stuck. 

•  Whenever  a  sequence  of  elements  at  the  top  of  the  slack  matches,  in  reverse, 
the  right-hand  side  of  some  rule  r  in  R,M  can  pop  that  sequence  off  and  re¬ 
place  it  by  the  left-hand  side  of  r.  When  this  happens,  we  say  that  M  has 
reduced  by  rule  r. 

Because  of  the  two  actions  that  it  can  perform,  a  parser  based  on  a  PDA  like 
M  is  called  a  shift-reduce  parser.  We'll  have  more  to  say  about  how  such  parsers 
work  in  Section  15.3.  For  now.  wc  just  observe  that  they  simulate,  backwards,  a 
right-most  derivation. 
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all  hut  the  last  of  the 
transitions  described 
below 


FIGURE  12J  A  PDA  that  parses  bottom-up. 


To  see  how  M  might  work,  suppose  that  R  contains  the  rules  A—*  a,B—>  b 
and  S-*AAB.  Assume  that  the  input  to  M  is  aab.Then  M  first  shifts  a  onto  the 
stack.  The  top  of  the  stack  matches  the  right-hand  side  of  the  first  rule.  So  M  can 
apply  the  rule,  pop  off  a,  and  replace  it  with  A.  Then  it  shifts  the  next  a,  so  the 
stack  is  aA.  It  reduces  by  the  first  rule  again, so  the  stack  is  AA.  It  shifts  the  ^ap¬ 
plies  the  second  rule,  and  leaves  the  stack  as  BAA.  At  that  point,  the  top  of  the 
stack  matches,  in  reverse,  the  right-hand  side  of  the  third  rule.  The  string  is  re¬ 
versed  because  the  leftmost  symbol  was  read  first  and  so  is  at  the  bottom  of  the 
stack.  M  will  pop  off  BAA  and  replace  it  by  S. 

To  accept,  M  must  pop  5  off  the  stack,  leave  the  stack  empty,  and  go  to  its  sec¬ 
ond  state,  which  will  accept.The  outline  of  M  is  shown  in  Figure  12.3. 

Formally,  M  =  ({p,  q },  2,  F,  A,  p,  {</}),  where  A  contains: 

•  The  shift  transitions:  ((p,  c,  e),  (p,  c)),  for  each  cel. 

•  The  reduce  transitions:  ((p,  e,  (y, y2 . . .  y„)R),  (p,  X)),  for  each  rule: 

X-*y\yi...yn  in  R. 

•  The  finish  up  transition:  ((p,  e,  S ),  ( q ,  e)). 

So  we  can  define: 

cfgtoPDA hottomup  (G:  CFG)  = 

From  G,  construct  M  as  defined  above. 


EXAMPLE  12.11  Using  cfgtoPDAtopdown  and  cfgtoPDAbottomup 

Consider  Expr,  our  simple  expression  language, defined  by  G  =  {{£,  T,F,  id, 
+, *.  (. )},  {id,  +,  *,  (, )},  /?, £},  where: 

R  =  E-*  E  +  T 

E-*T 

T—*T*F 

T-*F 
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EXAMPLE  12.11  ( Continued) 

F-*  F. 

F—  id} . 

We  show  two  PDAs.  M„  and  Mh.  that  accept  Expf.  We  can  use  the  function 
cfgloPDAiopdown(G)  to  build  Mu  = 

r/r./F. 


T 

(1)  (r/,e.  £),  (</,  E  +  T) 

(2)  (r/.  e.  £).  (</,  T) 

(3)  (t/.e.  T),  (q,  T*  F) 

(4) (</.e.  r).(r/,f) 

(5)  (r/.e,  £).(</,(£)) 

(6) (r/,  e.  £),  (c/.id) 

(7)  (r/. id.  id).  (<y. e) 

(«)  ('/•  (.  (  ).  (</.  e) 

(9)  ) ).  ('/<  «) 

(10)  (<7.  +, +),  (r/.e) 

(11  )(</,*,*).(</,*) 

We  can  use  cfgtoPDAbotiomup(G)  to  build  A//,  = 


(1) 0?,id.e).  (/7.id) 

(2) (p,(,e).  (p,  () 
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(3)  (/>,),  e)<  (/><)) 

(4)  (p,  +,  e),  (y>,  +) 

(5)  (p,*,e).  (p,  *) 

(6)  (p,  e,  7  +  £),  (/>,  £) 

(7)  (p,  e,  7'),  (yj.E) 

(8)  (p,  e,F*  T),  (p,  T) 

(9) (/;,e,  E),  (/),  7”) 

(10)  (p,e.  )£(),(/>,£) 

(11)  (p.e.  id), (/>,£) 


The  theorem  that  we  just  proved  is  important  for  two  very  different  kinds  of  reasons: 

•  It  is  theoretically  important  because  we  will  use  it  to  prove  one  direction  of  the 
claim  that  context-free  grammars  and  PDAs  describe  the  same  class  of  languages. 
For  this  purpose,  all  we  care  about  is  the  truth  of  the  theorem. 

•  It  is  of  great  practical  significance.  The  languages  we  use  to  communicate  with  pro¬ 
grams  are,  in  the  main,  context-free.  Before  an  application  can  assign  meaning  to 
our  programs,  our  queries,  and  our  marked  up  documents,  it  must  parse  the  state¬ 
ments  that  we  have  written.  Consider  either  of  the  PDAs  that  we  built  in  our  proof 
of  this  theorem.  Each  stack  operation  of  either  of  them  corresponds  to  the  building 
of  a  piece  of  the  parse  tree  that  corresponds  to  the  derivation  that  the  PDA  found. 
So  we  can  go  a  long  way  toward  building  a  parser  by  simply  augmenting  one  of  the 
PDAs  that  we  just  built  with  a  mechanism  that  associates  a  tree-building  operation 
with  each  stack  action.  Because  the  PDAs  follow  the  structure  of  the  grammar,  we 
can  guarantee  that  we  get  the  parses  we  want  by  writing  appropriate  grammars.  In 
truth,  building  efficient  parsers  is  more  complicated  than  this.  Well  have  more  to 
say  about  the  issues  in  Chapter  15. 

12.3.2  Building  a  Grammar  from  a  PDA  • 

We  next  show  that  it  is  possible  to  go  the  other  way,  from  a  PDA  to  a  grammar.  Unfor¬ 
tunately.  the  process  is  not  as  straightforward  as  the  grammar-lo-PDA  process.  Fortu¬ 
nately.  for  applications,  it  is  rarely  (if  ever)  necessary'  to  go  in  this  direction. 

Restricted  Normal  Form 

The  grammar-creation  algorithm  that  we  are  about  to  define  must  make  some  assump¬ 
tions  about  the  structure  of  the  PDA  to  which  it  is  applied.  So,  before  we  present  that 
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algorithm,  we  will  define  what  we'll  call  restricted  normal  form  for  PDAs.  A  PDA  M  is 
in  restricted  normal  form  ill: 

1.  M  has  a  start  state  s'  that  does  nothing  except  push  a  special  symbol  onto  the 
stack  and  then  transfer  to  a  slate  s  from  w  hich  the  rest  of  the  computation  begins. 
There  must  be  no  transitions  back  to  s'.  The  special  symbol  must  not  be  used  in 
any  other  way  in  M.  We  will  use  #  to  stand  for  such  a  symbol. 

2.  M  has  a  single  accepting  stale  a.  All  transitions  into  ti  pop  #  and  read  no  input. 

3.  Every  transition  in  M,  except  the  one  from  s',  pops  exactly  one  symbol  from  the 
slack. 

As  with  other  normal  forms,  in  order  for  restricted  normal  form  to  be  useful,  we 
must  define  an  algorithm  that  converts  an  arbitrary  PDA  M  =  ( K .  S.  I*.  A,.v.  A)  into 
it.  Given  M.  convert PDAtorestricted  builds  a  new  PDA  Af  ’  such  that  L  (M')  —  L  (M) 
and  M'  is  in  restricted  normal  form. 

convert  PDAtorestricted  (M\  PDA)  = 

1.  Initially,  let  M'  =  M. 

I*  Establish  property  1: 

2.  Create  a  new  start  state  s'. 

3.  Add  the  transition  ((s',  b,  b).  (.v,  #)). 

/*  Establish  property  2: 

4.  Create  a  new  accepting  stale  a. 

5.  For  each  accepting  stale  q  in  M  do: 

5.1.  Create  the  transition  ((</.  e,  #).  (a,  e)). 

5.2.  Remove  q  from  the  set  of  accepting  states  (making  a  the  only  accepting 
slate  in  M'). 

I*  Establish  property  3: 

I*  Assure  that  no  more  than  one  symbol  is  popped  at  each  transition: 

6.  For  every  transition  Mhat  pops  k  symbols,  where  k  >  1  do: 

6.1.  Replace  t  with  k  transitions,  each  of  which  pops  a  single  symbol.  Create 
additional  slates  as  necessary  to  do  this.  Only  if  the  last  of  the  k  symbols 
can  be  popped  should  any  input  be  read  or  any  new  symbols  pushed. 
Specifically,  let  qqx.  </</> ....  </</*  i  be  new  stale  names.  Ihen: 

Replace  ((r/|,f,  yi72...y„),  (</2.  Y/*))  wilh: 

({<7i.  e.  >i).  (qqu  e)).  ((</</ 1,  e.  Y>).  (</</:■  k)) . 

(Uidk-\>c,  y„),  (</2i  yr)Y 

I*  Assure  that  exactly  one  symbol  is  popped  at  each  transition.  We  already 
know  that  no  more  than  one  will  be.  Bui  perhaps  none  were.  In  that  case,  what 
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M'  needs  to  do  instead  is  to  pop  whatever  was  on  the  top  of  the  stack  and  then 
just  push  it  right  back.  So  we'll  need  one  new  transition  for  every  symbol  that 
might  be  on  the  top  of  the  stack.  Note  that,  because  of  existence  of  the  bottom 
of  stack  marker  #,  we  are  guaranteed  that  the  stack  will  not  be  empty  so  there 
will  always  be  a  symbol  that  can  be  popped. 

7.  For  every  transition  t  =  ((r/j,  c,  e),  (</2,  y))  do: 

7.1.  Replace  /  with  |rw  |  transitions,  each  of  which  pops  a  single  symbol  and 
then  pushes  it  back  on.  Specifically,  for  each  symbol  a  in  U  {#},  add 
the  transition  ((</ 1,  c.  a),  (r/2.  7«))- 

8.  Return  M'. 


EXAMPLE  12.12  Converting  to  Restricted  Normal  Form 

Let  WcWR  =  {?ucwR :  we  {a,  b}*}.  A  straightforward  PDA  M  that  accepts 
WcWR  is  the  one  we  showed  in  Example  12.3: 


M  is  not  in  restricted  normal  form.  To  create  an  equivalent  PDA  M\  we  first 
create  new  start  and  accepting  states  and  connect  them  to  M: 


M'  contains  no  transitions  that  pop  more  than  one  symbol.  And  it  contains  no 
transitions  that  push  more  than  one  symbol.  But  it  does  contain  transitions  that 
pop  nothing.  Since  =  {a,  b.  #},  the  three  transitions  from  state  s  must  be  re¬ 
placed  by  the  following  nine  transitions: 

((5,  a,  #),(*,  a#)),  #((*,  a,  a),(s,  aa)),  #((*,  a,  b ).(i,  ab)), 

Us,  b,  #),(s,  b#)),  #((.s ,  b,  a),(j,  ba)),  #(( s,  b,  b),(s,  bb)), 

as,  C,  #),(/,  #)),  #( (S,  C,  a) , (/,  a)),  #((s,  c,  b),(/,  b)). 
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Building  the  Grammar 

Since  we  have  now  shown  that  any  PDA  can  be  converted  into  an  equivalent  one  in  re¬ 
stricted  normal  form,  we  can  show  that,  for  any  PDA  Af.  there  exists  a  context-free 
grammar  that  generates  L{ M)  by  first  converting  M  to  restricted  normal  form  and  then 
constructing  a  grammar. 


THEOREM  12.2  For  Every  PDA  There  Exists  an  Equivalent  CFG 

Theorem:  Given  a  PDA  M  =  ( K ,  2,  T,  A,  s,  A),  there  existsa  CFG  G  =  (V,  2,  R ,  5) 
such  that  L  (G)  =  L  (A/). 

Proof:  The  proof  is  by  construction.  In  the  proof  of  Theorem  12.1,  we  showed  how 
to  use  a  PDA  to  simulate  a  grammar.  Now  we  show  how  to  use  a  grammar  to  sim¬ 
ulate  a  PDA.The  basic  idea  is  simple: The  productions  of  the  grammar  will  simu¬ 
late  the  moves  of  the  PDA.  Unfortunately,  the  details  get  messy. 

The  first  step  of  the  construction  of  G  will  be  to  build  from  M,  using  the  algo¬ 
rithm  convertPDAiorestricled  that  we  just  defined,  an  equivalent  PDA  M\  where 
M'  is  in  restricted  normal  form.  So  every  machine  that  the  grammar-construction 
algorithm  must  deal  with  will  look  like  this  (with  the  part  in  the  middle  that  actu¬ 
ally  does  the  work  indicated  with  . . . ): 


G,the  grammar  that  we  will  build,  will  exploit  a  collection  of  nonterminal  sym¬ 
bols  to  which  we  will  give  names  of  the  following  form: 


<9/.  %  <7y>- 


The  job  of  a  nonterminal  <</,,  y.  qj>  is  to  generate  all  and  only  the  strings  that 
can  drive  M  from  state  <7,  with  the  symbol  y  on  the  stack  to  state  qf,  having 
popped  off  the  stack  y  and  anything  else  that  got  pushed  on  top  of  it  in  the 
process  of  going  from  qt  to  qr  So,  for  example,  in  the  machine  M'  that  we  de¬ 
scribed  above  in  Example  12.12.  the  job  of  <.v,  #,  a>  is  to  generate  all  the  strings 
that  could  take  M'  from  s  with  #  on  the  top  of  the  stack  to  a,  having  popped  the  # 
(and  anything  else  that  got  pushed  along  the  way)  off  the  stack.  But  notice  that 
that  is  exactly  the  set  of  strings  that  M'  will  accept.  So  G  will  contain  the  rule: 


S  —*■  <s ,  #.  a>. 

Now  we  need  to  describe  the  rules  that  will  have  <s.  #.  a>  on  their  left-hand 
sides.  They  will  make  use  of  additional  nonterminals.  For  example,  M'  from 
Example  12.12  must  go  through  state /on  its  way  to  a.  So  there  will  be  the  non¬ 
terminal  </,  #,  a>,  which  describes  the  set  of  strings  that  can  drive  M'  from /to 
a,  popping  #.  That  set  is.  of  course,  { e } . 
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How  can  an  arbitrary  machine  M  get  from  one  state  to  another?  Because  M  is 
in  restricted  normal  form,  we  must  consider  only  the  following  three  kinds  of 
transitions,  all  of  which  pop  exactly  one  symbol: 

•  Transitions  that  push  no  symbols:  Suppose  that  there  is  a  such  a  transition 
((<7,  c,  y),  (r,  e)),  where  ce  2  U  {e}.  We  consider  how  such  a  transition  can 
participate  in  a  computation  of  M: 

0^0  0 

If  this  transition  is  taken,  then  M  reads  c.pops  y,  and  then  moves  to  r.  After 
doing  that,  it  may  follow  any  available  paths  from  r  to  any  next  state  to,  where 
w  may  be  q  or  r  or  any  other  state.  So  consider  the  nonterminal  <q,  y,  w>,  for 
any  state  w.  Its  job  is  to  generate  all  strings  that  drive  M  from  q\ow  while  pop¬ 
ping  off  y.  We  now  know  how  to  describe  at  least  some  of  those  strings:  They 
are  the  ones  that  start  with  c  and  are  followed  by  any  string  that  could  drive  M 
from  r  to  w  without  popping  anything  (since  the  only  thing  we  need  to  pop,  y, 
has  already  been  popped).  So  we  can  write  the  rule: 

<q,  y,  w>  — *  c<r ,  e,  w>. 

Read  this  rule  to  say  that  M  can  go  from  q  to  w,  leaving  the  stack  just  as  it 
was  except  that  a  y  on  the  top  has  been  popped,  by  reading  c,  popping  y, 
going  to  r,  and  then  somehow  getting  from  r  to  w,  leaving  the  stack  just  as  it 
was.  Since  M  reads  c,  G  must  generate  it. 

Every  transition  in  M  of  the  form  ((q,  c,  y),  (r,  e))  generates  one  grammar 
rule,  like  the  one  above,  for  every  state  w  in  M,  except  s'. 

•  Transitions  that  push  one  symbol:  This  situation  is  similar  to  the  case  where  M 
pushes  no  symbols  except  that  whatever  computation  follows  must  pop  the 
symbol  that  this  transition  pushes.  So,  suppose  that  M  contains: 

0-^0  o 

If  this  transition  is  taken,  then  M  reads  the  character  c,  pops  y,  pushes  a, 
and  then  moves  to  r.  After  doing  that,  it  may  follow  any  available  paths  from  r 
to  any  next  state  w ,  where  w  may  be  q  or  r  or  any  other  state.  So  consider  the 
nonterminal  <q,  y,  w>,  for  any  state  w.  Its  job  is  to  generate  all  strings  that 
drive  M  from  q  to  w  while  popping  off  y.  We  now  know  how  to  describe  at 
least  some  of  those  strings:  They  are  the  ones  that  start  with  c  and  are  followed 
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by  any  string  that  could  drive  M  from  r  to  w  while  popping  the  a  that  just  got 
pushed.  So  we  can  write  the  rule: 

<q.  y,  u»  —*  c<r,  a,  ir>. 

Read  this  rule  to  say  that  M  can  go  from  q  to  tt\  leaving  the  stack  just  as  it 
was  except  that  a  y  on  the  top  has  been  popped,  by  reading  t\  popping  y. 
pushing  a,  going  to  r.  and  then  somehow  getting  from  r  to  «\  leaving  the  stack 
just  as  it  was  except  that  a  a  on  the  top  has  been  popped. 

Every  transition  in  M  of  the  form  ((</.  c,  y).  (r.  a))  generates  one  grammar 
rule,  like  the  one  above,  for  every  state  w  in  M .  except  s'. 

Transitions  that  push  two  symbols:  This  situation  is  a  bit  more  complicated 
since  two  symbols  are  pushed  and  must  then  be  popped. 

0^0  O  0 

If  this  transition  is  taken,  then  M  reads  c.  pops  y.  pushes  two  characters 
a/3,  and  then  moves  to  r.  Now  suppose  that  we  again  want  to  consider  strings 
that  drive  M  from  q  to  ir,  where  the  only  change  to  the  stack  is  to  pop  the  y 
that  gets  popped  on  the  way  from  q  to  r.  This  lime,  two  symbols  have  been 
pushed,  so  both  must  subsequently  be  popped.  Since  M  is  in  restricted  normal 
form,  it  can  pop  only  a  single  symbol  on  each  transition.  So  the  only  way  to  go 
from  r  to  w  and  pop  both  symbols  is  to  visit  another  state  in  between  the  two. 
Call  it  v,  as  shown  in  the  figure.  We  now  know  how  to  describe  at  least  some 
of  the  strings  that  drive  M  from  q  to  w.  popping  y:  They  are  the  ones  that  start 
with  c  and  are  followed  first  by  any  siring  that  could  drive  M  from  r  to  v  while 
popping  a  and  then  by  any  string  that  could  drive  M  from  v  to  w  while  pop¬ 
ping  /3.  So  we  can  write  the  rule: 


<q , y. w>—*c<r,a,v><v.[}.  w> 

Every  transition  in  M  of  the  form  ((</,  v,  X).  (r,  a/3))  generates  one 
grammar  rule,  like  the  one  above,  for  every  pair  of  states  v  and  w  in  M.  ex¬ 
cept  s'.  Note  that  v  and  w  may  be  the  same  and  either  or  both  of  them 
could  be  q  or  r. 

•  Transitions  that  push  more  than  two  symbols: These  transitions  can  be  treated 
by  extending  the  technique  for  two  symbols,  adding  one  additional  state  for 
each  additional  symbol. 

The  last  situation  that  we  need  to  consider  is  how  to  slop.  So  far.  every  rule  we 
have  created  has  some  nonterminal  on  its  right-hand  side.  If  <7  is  going  to  gener¬ 
ate  strings  composed  solely  of  terminal  symbols,  it  must  have  a  way  to  eliminate 
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Ihe  final  nonterminals  once  all  the  terminal  symbols  have  been  generated.  It  can 
do  this  with  one  rule  for  every  state  q  in  M: 

<q ,  e,q>-*e. 

Read  these  rules  to  say  that  M  can  start  in  q ,  remain  in  q ,  having  popped  nothing, 
without  consuming  any  input. 

We  can  now  define  buildgrammar(M),  which  assumes  that  M  is  in  restricted 
normal  form: 

buildgrammar[M\  PDA  in  restricted  normal  form)  = 

1.  Set  to  2 /y. 

2.  Set  the  start  symbol  of  G  to  S. 

3.  Build  R  as  follows: 

3.1.  Insert  the  rule  S  -*  <s,  #,  a>. 

3 2.  For  every  transition  ((<?,  c,  y),  (r,  e))  (i.e.,  every  transition  that 
pushes  no  symbols),  and  every  state  w,  except  s\  in  M  do: 

Insert  the  rule  <q ,  -y,  w>  — *  c<r,  e,  w>. 

3 .3.  For  every  transition  ((q,  c,  y),  (r,  a))  (i.e.,  every  transition  that 
pushes  one  symbol),  except  the  one  from  s',  and  every  state  w, 
except  s',  in  M  do: 

Insert  the  rule  <q,  y,  w>  — *  c<r,  a,  w>. 

3.4.  For  every  transition  (( q ,  c,  y),  (r,  a/3))  (i.e.,  every  transition  that 
pushes  two  symbols),  except  the  one  from  s',  and  every  pair  of 
states  v  and  w,  except  s',  in  M  do: 

Insert  the  rule  <q ,  y ,  w>  -*  c<r,  a,  v><v,  /3,  w>. 

3.5.  In  a  similar  way,  create  rules  for  transitions  that  push  more  than 
two  symbols. 

3.6.  For  every  state  q,  except  s',  in  M  do: 

Insert  the  rule  <q,  etq>—*e. 

4.  Set  Vc  to  U  {nonterminal  symbols  mentioned  in  the  rules  insert¬ 
ed  into  R). 

The  algorithm  buildgrammar  creates  all  the  nonterminals  and  all  the  rules  re¬ 
quired  for  (J  to  generate  exactly  the  strings  in  L(M).  We  should  note,  however, 
that  it  generally  also  creates  many  nonterminals  that  are  useless  because  they  are 
either  unreachable  or  unproductive  (or  both).  For  example,  suppose  that,  in  M , 
there  is  a  transition  ((g6,  c,  y),  (q7,  a))  from  state  q6  to  state  q^,  but  no  path  from 
state  q^  to  state  q%.  Nevertheless,  in  step  3.3,  buildgrammar  will  insert  the  rule 

<^6*  y-  <78>  ~*c<qi*  Kq^  <?s>  is  unproductive  since  there  are  no 

strings  that  drive  M  from  to  q%. 

Finally,  for  an  arbitrary  PDA  M,  we  define  PDAtoCFG : 

PDAtoCFG  (M:  PDA)  = 

1.  Return  buildgrammar(convertPDAtorestricted(M)). 
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EXAMPLE  12.13  Building  a  Grammar  from  a  PDA 

In  Example  12.12,  we  showed  a  simple  PDA  for  WcWR  =  {«>cirK  *.  we  {a,b}*}. 
Then  we  converted  that  PDA  to  restricted  normal  form  and  got  M 


Each  of  the  bracket-labeled  arcs  corresponds  to: 

[*]  ((5,  a,  #),  (.v,  a#)),  ((s,  a,  a),  (.v,  aa)),  ((.v,  a.  b),  (s,  ab)), 
l**]  (( s , b, #),  (s,  b#)),  ((s,  b,  a),  (.v,  ba)), ((s,  b.  b).  (j,  bb)),  and 
[***1  ((s,  c.  #),  (f.  #)),  ((.v,  c,  a),  (f.  a)),  «s,  c.  b).  (/.  b)). 

Buildgrammar  constructs  a  grammar  G  from  M'.To  see  how  G  works,  consider 
the  parse  tree  that  it  builds  for  the  input  string  abcba.The  numbers  in  brackets  at 
each  node  indicate  the  rule  that  is  applied  to  the  nonterminal  at  the  node. 

S[  11 


<jr.  #.  a>  [2] 


b  </.«•/>  I VI 


Here  are  some  of  the  rules  in  G.  On  the  left  are  the  transitions  of  M The  mid¬ 
dle  column  contains  the  rules  derived  from  each  transition.  The  ones  marked  [x] 
in  the  right  column  contain  useless  nonterminals  and  so  cannot  be  part  of  any  der¬ 
ivation  of  a  string  in  L[G).  Because  there  are  so  many  useless  rules,  we  have  omit¬ 
ted  the  ones  generated  from  all  transitions  after  the  first. 
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$  <s,  #,  a>  [i] 

((s  ,  e,  e),  (j,  #))  no  rules  based  on  the  transition  from  sn 
[*]  ((s,  a,  #),  (5,  a#))  <s ,  #,  s>  — ►  a  <s,  a,  5>  <s,  #,  .*>  [x] 

<s,  #,  5>  -*  a  <s,  a,  f>  </,  #,  y >  [x] 

<s,  #,  s>  -*■  a  <s ,  a,  a>  <a,  #,  s>  [x] 

<5,  #,/>-+  a  <5,  a,  j>  <5,  #,/>  [x] 

<s,  #,/>-►  a  <5,  a,  p>  </,  #,  />  [X] 

<s,  #,/>-»  a  <s,  a,  a>  <a,  #,  />  [x] 

<r,  #,  a>  -+  a  <s,  a,  $>  <5,  #,  [x] 

<j,  M>  -»  a  <s,  a, </,  #, fl>  [2] 

<5,  #,  a>  -*  a  <s,  a,  a>  <0,  #,  o>  [x] 

((s,  a,  a),  (j,  aa))  <s,  a,/>  ->  a<s,  a,  f>  </,  a,y>  [3] 

((5,  a,  b),  (a,  ab))  <*,  b,/>  —  a  <5,  a ,f>  </,  b,/>  [14] 

[**]  b’ #^’  b#))  <s •  #>.£>-*  b  <$,  b,  />  </,  #,  p>  [15] 

((5,  b,  a),  (a,  ba))  <5,  a %p>  -+  b  <*,  b,  f>  </,  a,  f>  [4] 

((5.  b,  b).  (*,  bb))  <s,  b,f>  —*  b  <,,  b,  f>  </,  b,  f>  [16] 

[***]  <s,  #,/>-*  c  </,#,/>  [17] 

((s,  c,  a),  (/;  a))  <*,  a -*  c  </?  a>  8j 

((5,  C,  b),  (/;  b))  <S,  b,/>  -*  C  </,  b,  />  f5] 

((A  e,  #),  (o,  e))  </,  #,  -+  e  <a,  e?  fl>  [6] 

((A  a,  a),(f,  e))  </,  a, /^a  </,«,/>  [7] 

((f.  b,  b),(f,  e))  </,b,/^>-  b  </,«,/>  [8] 

<A\  e,s>— >e  jigj 

<a,  e,  a>— *e  jjqj 
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12.3.3  The  Equivalence  of  Context-free  Grammars  and  PDAs 

THEOREM  12.3  PDAs  and  CFGs  Describe  the  Same  Class  of  Languages 

Theorem:  A  language  is  context-free  iff  it  is  accepted  by  some  PDA. 

Proof:  Theorem  12.1  proves  the  only  if  part.  Theorem  12.2  proves  the  if  part. 


12.4  Nondeterminism  and  Halting 

Recall  that  a  compulation  C  of  a  PDA  M  =  (K,  A.  I .  A.  s.  /\)  on  a  string  w  is  an  ac¬ 
cepting  computation  iff: 

C  *  (5,  »>,  (</,  e,  e).  for  some  r/e  A. 

We'll  say  that  a  compulation  C  of  M  halts  iff  at  least  one  of  the  following  conditions 
holds: 

•  C  is  an  accepting  computation,  or 

•  C  ends  in  a  configuration  from  which  there  is  no  transition  in  A  that  can  be 
taken. 

We'll  say  that  M  halts  on  w  iff  every  computation  of  M  on  ir  halts.  If  M  halts  on  W 
and  docs  not  accept,  then  we  say  that  M  rejects  w. 

For  every  context-free  language  L .  we've  proven  that  there  exists  a  PDA  M  such 
that  L  (M)  =  L.  Suppose  that  we  would  like  to  he  able  to: 

■  Examine  a  string  and  decide  whether  or  not  it  is  in  L. 

•  Examine  a  string  that  is  in  L  and  create  a  parse  tree  for  it. 

•  Examine  a  string  that  is  in  L  and  create  a  parse  tree  for  it  in  time  that  is  linear  in  the 
length  of  the  string. 

•  Examine  a  string  and  decide  whether  or  not  it  is  in  the  complement  of  L. 

Do  PDAs  provide  the  tools  we  need  to  do  those  things?  When  wc  were  at  a  similar 
point  in  our  discussion  of  regular  languages,  the  answer  to  that  question  was  yes.  For 
every  regular  language  there  exists  a  minimal  deterministic  FS\1  that  accepts  it. 
That  minimal  DFSM  halts  on  all  inputs, accepts  all  strings  that  are  in  /..and  rejects  all 
strings  that  are  not  in  L. 

Unfortunately,  the  facts  about  context-free  languages  and  PDAs  are  different  from 
the  facts  about  regular  languages  and  FSMs.Now  wc  must  face  the  following: 

1.  There  are  context-free  languages  for  which  no  deterministic  PDA  exists.  We’ll 
prove  this  as  Theorem  13.13. 

2.  It  is  possible  that  a  PDA  may 

•  not  halt,  or 

•  not  ever  finish  reading  its  input. 
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So,  let  M  be  a  PDA  that  accepts  some  language  L.Then.  on  input  w,  if  u'eL 
then  M  will  halt  and  accept.  But  if  w  «  L .  while  M  will  not  accept  w,  it  is  possible 
that  it  will  not  reject  it  either  To  see  how  this  could  happen,  let  2  =  {  a}  and  con¬ 
sider  the  PDA  M,  shown  in  Figure  12.4.  L  (M)  =  {a}.  The  computation  (1,  a,  e) 
|-  (2,  a.  a)  |-  (3,  e,  e)  will  cause  M  to  accept  a.  But  consider  any  other  input  ex¬ 
cept  a.  Observe  that: 

•  M  will  never  halt.  There  is  no  accepting  configuration,  but  there  is  always  at 
least  one  computational  path  that  has  not  yet  halted.  For  example,  on  input  aa, 
one  such  path  is: 

(1,  aa,  e)  |-  (2,  aa,  a)  |-  (1,  aa,  aa)  |-  (2,  aa,  aaa)  |- 
(1,  aa,  aaaa)  I-  (2,  aa,  aaaaa)  |-... 

•  M  will  never  finish  reading  its  input  unless  its  input  is  e.  On  input  aa,  for  ex¬ 
ample.  there  is  no  computation  that  will  read  the  second  a. 

3.  There  exists  no  algorithm  to  minimize  a  PDA.  In  fact,  it  is  undecidable  whether  a 
PDA  is  already  minimal. 


FIGURE  12.4  A  PDA  that  may  neither 
accept  nor  reject. 

Problem  2  is  especially  critical.  This  same  problem  also  arose  with  NDFSMs.  But 
there  we  had  a  choice  of  two  solutions: 

•  Use  ndfsmtodfsm  to  convert  the  NDFSM  to  an  equivalent  deterministic  one.  A 
DFSM  halls  on  input  w  in  M  steps. 

•  Simulate  the  NDFSM  using  rulfsmsiiriulale,  which  ran  all  computational  paths  in 
parallel  and  handled  e-transitions  in  a  way  that  guaranteed  that  the  simulation  of 
an  NDFSM  M  on  input  w  halted  in  | w\  steps. 

Neither  of  those  approaches  works  for  PDAs.  There  may  not  be  an  equivalent  de¬ 
terministic  PDA.  And  it  is  not  possible  to  simulate  all  paths  in  parallel  on  a  single  PDA 
because  each  path  would  need  its  own  stack.  So  what  can  we  do?  Solutions  to  these 
problems  fall  into  two  classes: 

•  Formal  ones  that  do  not  restrict  the  class  of  languages  that  are  being  considered. 
Unfortunately,  these  approaches  generally  do  restrict  the  form  of  the  grammars 
and  PDAs  that  can  be  used.  For  example,  they  may  require  that  grammars  be  in 
Chomsky  or  Greibach  normal  form.  As  a  result,  parse  trees  may  not  make  much 
sense.  We'll  see  some  of  these  techniques  in  Chapter  14. 

•  Practical  ones  that  work  only  on  a  subclass  of  the  context-free  languages.  But 
the  subset  is  large  enough  to  be  useful  and  the  techniques  can  use  grammars  in 
their  natural  forms.  We'll  see  some  of  these  techniques  in  Chapters  13  and  15. 

12.5  Alternative  Equivalent  Definitions  of  a  PDA* 

We  could  have  defined  a  PDA  somewhat  differently.  We  list  here  a  few  reasonable  al¬ 
ternative  definitions.  In  all  of  them  a  PDA  M  is  a  sextuple  ( K .  2.  1\  A,.v,  A): 
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•  We  allow  Af  to  pop  and  to  push  any  string  in  1'*.  In  some  definitions.  Af  may  pop 
only  a  single  symbol  but  it  may  push  any  number  of  them.  In  some  definitions.  Af 
may  pop  and  push  only  a  single  symbol. 

•  In  our  definition.  Af  accepts  its  input  it*  only  if.  when  it  finishes  reading  ir,  it  is  in  an 
accepting  state  and  its  stack  is  empty. There  are  two  alternatives  to  this: 

•  Accept  if.  when  the  input  has  been  consumed.  A/  lands  in  an  accepting  state,  re¬ 
gardless  of  the  contents  of  the  stack. 

•  Accept  if.  when  the  input  has  been  consumed,  the  stack  is  empty,  regardless  of 
the  state  Af  is  in. 

All  of  these  definitions  are  equivalent  in  the  sense  that,  if  some  language  L  is  ac¬ 
cepted  by  a  PDA  using  one  definition,  it  can  be  accepted  by  some  PDA  using  each  of 
the  other  definitions. 

We  can  prove  this  claim  for  any  pair  of  definitions  by  construction.  To  do  so.  we  show 
an  algorithm  that  transforms  a  PDA  of  one  sort  into  an  equivalent  PDA  of  the  other  sort. 


EXAMPLE  12.14  Accepting  by  Final  State  Alone 

Define  a  PDA  Af  =  (X\  2,  T.  A,  s,  A)  in  exactly  the  way  we  have  except  that  it 
will  accept  iff  it  lands  in  an  accepting  stale,  regardless  of  the  contents  of  the  stack. 
In  other  words,  if  (s,  to,  e)  \-M*  (q*  e.  y)  and  q  e  A.  then  Af  accepts. 

To  show  that  this  model  is  equivalent  to  ours,  we  must  show  two  things:  For 
each  of  our  machines,  there  exists  an  equivalent  one  of  these,  and.  for  each  of 
these,  there  exists  an  equivalent  one  of  ours.  We’ll  do  the  first  part  to  show  how 
such  a  construction  can  be  done.  We  leave  the  second  as  an  exercise. 

Given  a  PDA  Af  that  accepts  by  accepting  state  and  empty  stack,  construct  a 
new  PDA  Af'  that  accepts  by  accepting  state  alone,  where  L  (Af')  =  L  (Af).  Af’ 
will  have  a  single  accepting  state  qa.  The  only  way  for  Af  to  gel  to  qa  will  be  to 
land  in  an  accepting  state  of  Af  when  the  stack  is  logically  empty.  But  there  is  no 
way  to  check  that  the  stack  is  empty.  So  Af'  will  begin  by  pushing  a  boltom-of- 
stack  marker  #,  onto  the  slack.  Whenever  #  is  the  top  symbol  on  the  stack,  the 
stack  is  logically  empty. 

So  the  construction  proceeds  as  follows: 

1.  Initially,  let  Af  =  Af. 

2.  Create  a  new  start  state  s'.  Add  the  transition  ((.v\  e,  e).  (a,  #)). 

3.  Create  a  new  accepting  state  qa. 

4.  For  each  accepting  state  a  in  Af  do: 

Add  the  transition  ((a.  e,  #),  (.qa,  e)). 

5.  Make  qa  the  only  accepting  state  in  Af '. 

It  is  easy  to  see  that  Af '  lands  in  its  only  accepting  state  (q„)  iff  Af  lands  in  some 
accepting  state  with  an  empty  stack.  Thus  Af '  and  Af  accept  the  same  strings. 
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As  an  example,  we  apply  this  algorithm  to  the  PDA  we  built  for  the  balanced 
parentheses  language  Bal: 


Notice,  by  the  way,  that  while  M  is  deterministic,  M'  is  not. 


12.6  Alternatives  that  are  Not  Equivalent  to  the  PDA  • 

We  defined  a  PDA  to  be  a  finite  state  machine  to  which  we  add  a  single  stack.  We  men¬ 
tion  here  two  variants  of  that  definition,  each  of  which  turns  out  to  define  a  more  pow¬ 
erful  class  of  machine.  In  both  cases,  we’ll  still  start  with  an  FSM. 

For  the  first  variation,  we  add  a  first-in,  first-out  (FIFO)  queue  in  place  of  the  stack. 
Such  machines  are  called  tag  systems  or  Post  machines.  As  we’ll  see  in  Section  18.2.3, 
tag  systems  are  equivalent  to  Turing  machines  in  computational  power. 

For  the  second  variation,  we  add  two  stacks  instead  of  one.  Again,  the  resulting  ma¬ 
chines  are  equivalent  in  computational  power  to  Turing  machines,  as  we’ll  see  in 
Section  17.5.2. 

Exercises 

1.  Build  a  PDA  to  accept  each  of  the  following  languages  L : 

a.  BalDelim  =  {w :  where  w  is  a  string  of  delimiters:  (,),[,  ],{,},  that  are  prop¬ 
erly  balanced}. 

b.  { aV :  2/  =  3 j  +  1}. 

c.  {we{a.b}*:#a(w)  =  2*#b(w)}. 

d.  {a"bm:m  n  <  2m}. 

e.  {we  {a, b}*:w  =  wR}. 

L  {aWc* :  /,/,  k  >  0  and  (i  #  j  or  j  #  k)}. 

g.  {we  {a, b}* :  every  prefix  of  w  has  at  least  as  many  a’s  as  b’s}. 

h.  {a''b'”a"  :  n,  m  ^  0  and  m  is  even}. 

i.  {.vc" : x e  {a,  b}*,  #a(x)  =  n  or  #b(x)  =  «}. 

j.  { a'^** :  m  s  n,  m-n  is  even } . 

k.  {awbV<f» :  /;?,  n,  p,  q  >  0  and  m  +  n  =  p  +  q}. 
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l.  ^  is  the  binary  representation  of  some  integer  /,  i  2  0.  without 
leading  zeros}.  (For  example  101#011  e  L.) 

m.  {ar#v  :  x,  _ve  (0,1}*  and  .v  is  a  substring  of  v }. 

n.  L,*.  where  L,  =  {jarR : xe  {a.b}*}, 

2.  Complete  the  PDA  that  we  sketched,  in  Example  12.N.  for  ->A"B"Cn,  where 
ArB"Cn  =  { a"b"c" :  n  2  ()}. 

3.  Let  L  ~  (ba",|ba'"’bam'...  ba'w,:/i  2  2,m\.mr . mu  2  t).  and  m,  /  nij  for 

some  /.;}. 

a.  Show  a  PDA  that  accepts  /.. 

b.  Show  a  context-free  grammar  that  generates  L. 

c.  Prove  that  L  is  not  regular. 

4.  Consider  the  language  L  =  L\  D  l.,,  where  Lt  =  {  rr<rR  •  tee  {a.  b}*}  and 
L2  =  {a"b*a''  :  n  2  0}. 

a.  List  the  first  four  strings  in  the  lexicographic  enumeration  of  L. 

b.  Write  a  context-free  grammar  to  generate  L. 

c.  Show  a  natural  PDA  for  L.  ( In  other  words,  don’t  just  build  it  from  the  gram¬ 
mar  using  one  of  the  two-state  constructions  presented  in  this  chapter.) 

d.  Prove  that  L  is  not  regular. 

5.  Build  a  deterministic  PDA  to  accept  each  of  the  following  languages: 

a.  L%, where  L  =  (ire  (a. b}*: #a(m)  =  #b(M')}. 

b.  L$  where  L  =  {a"b‘am:u  2  Hand  3k  2  [)(m  =  2k  +  /i ) } . 

6.  Complete  the  proof  that  we  started  in  Example  12.14.  Specifically,  show  that  if 
M  is  a  PDA  that  accepts  by  accepting  stale  alone,  then  there  exists  a  PDA  M' 
that  accepts  by  accepting  state  and  empty  stack  (our  definition)  where 
L(M')  =  L(M). 


CHAPTER  13 


Context-Free  and  Noncontext-Free 
Languages 

The  language  AnBn  =  { a''b"  :  n  2:  0}  is  context-free.  The  language  AnBnCn  = 

{ a"b"c"  :n  ^  0}  is  not  context  free  (intuitively  because  a  PDA’s  stack  cannot  count 
all  three  of  the  letter  regions  and  compare  them).  PalEven  =  {wm>R  :  to  e  {a.  b}* }  is 
context-free. The  similar  language  WW  =  {ww  :  iv  e  {a.  b}*}  is  not  context-free  (again, 
intuitively,  because  a  stack  cannot  pop  the  characters  of  w  off  in  the  same  order  in  which 
they  were  pushed). 

Given  a  new  language  L .  how  can  we  know  whether  or  not  it  is  context-free?  In  this 
chapter,  we  present  a  collection  of  techniques  that  can  be  used  to  answer  that  question. 


13.1  Where  Do  the  Context-Free  Languages  Fit  in  the 
Big  Picture? 

First,  we  consider  the  relationship  between  the  regular  languages  and  the  context-free 
languages. 


THEOREM  13.1  The  Context-Free  Languages  Properly  Contain  the  Regular 
Languages 

Theorem:  Ibe  regular  languages  are  a  proper  subset  of  the  context-free  languages. 

Proof:  We  first  show  that  every  regular  language  is  context-free.  We  then  show  that 
there  exists  at  least  one  context-free  language  that  is  not  regular. 

We  show  that  every  regular  language  is  context-free  by  construction.  If  L  is 
regular,  then  it  is  accepted  by  some  DFSM  M  =  (K,  2, 5,s,  A).  From  M  we  con¬ 
struct  a  PDA  M'  =  (K\  S'.  T'.  A'. s'.  A')  to  accept  L.  In  essence,  M'  will  simply 
be  M  and  will  ignore  the  stack.  Let  M'  be  (K.  2.0,  A',*.  A),  where  A'  is  con¬ 
structed  as  follows:  For  every  transition  (<?,,  c,  </y)  in  5,  add  to  A'  the  transition 
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((</,.  c.  e).  (</,.  k)).  Af '  behaves  identically  lo  M.  so  /.  ( M )  -  l.  ( M' ).  So  the  regu¬ 
lar  languages  are  a  subset  of  the  context-free  languages. 

The  regular  languages  arc  a  proper  subset  of  the  context-free  languages  be¬ 
cause  there  exists  at  least  one  language.  A11!?",  that  is  context-free  but  not  regular. 


Next,  we  observe  that  there  are  many  more  noncontext-lrcc  languages  than  there 
are  context-free  ones: 

THEOREM  13.2  How  Many  Context-Free  Languages  are  There?  _ 

Theorem:  There  is  a  countably  infinite  number  of  context-free  languages. 

Proof:  Every  context-free  language  is  generated  by  some  context-free  grammar 
G  =  (V,  2,  R.  S).  We  can  encode  the  elements  of  V  as  binary  strings,  so  we  can 
lexicographically  enumerate  all  the  syntactically  legal  context-free  grammars. 
There  cannot  be  more  context-free  languages  than  there  are  context-free  gram¬ 
mars,  so  there  is  at  most  a  countably  infinite  number  of  context-free  languages. 
There  is  not  a  one-to-one  relationship  between  context-free  languages  and  con¬ 
text-free  grammars  since  there  is  an  infinite  number  of  grammars  that  generate 
any  given  language.  But.  by  Theorem  13.1.  every  regular  language  is  context- 
free.  And.  by  Theorem  8.1,  there  is  a  countably  infinite  number  of  regular  lan- 
guaaes.  So  there  is  at  least  and  at  most  a  countably  infinite  number  of 
context-free  languages. 

But.  by  Theorem  2.3.  there  is  an  uncountably  infinite  number  of  languages  over  any 
nonempty  alphabet  2.  So  there  are  many  more  nonconlext-free  languages  than  there 
are  regular  ones. 


13.2  Showing  That  a  Language  is  Context-Free 

We  have  so  far  seen  two  techniques  that  can  be  used  to  show  that  a  language  L  is 
context-free: 

•  Exhibit  a  context-free  grammar  for  it. 

•  Exhibit  a  (possibly  nondeterministic)  PDA  for  it. 

There  are  also  closure  theorems  for  context-free  languages  and  they  can  be  used 
to  show  that  a  language  is  context-free  if  it  can  be  described  in  terms  of  other  lan¬ 
guages  whose  status  is  already  known.  Unfortunately,  there  are  fewer  closure  theo¬ 
rems  for  the  context-free  languages  than  there  are  for  the  regular  languages.  In 
order  to  be  able  to  discuss  both  the  closure  theorems  that  exist,  as  well  as  the  ones 
we'd  like  but  don’t  have,  we  will  wait  and  consider  the  issue  of  closure  theorems  in 
Section  13.4.  after  we  have  developed  a  technique  for  showing  that  a  language  is  not 
context-free. 
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FIGURE  13.1  The  structure  of  a  parse  tree. 

13.3  The  Pumping  Theorem  for  Context-Free  Languages 

Suppose  we  are  given  a  language  and  we  want  to  prove  that  it  is  not  context-free.  Just 
as  with  regular  languages,  it  is  not  sufficient  simply  to  claim  that  we  tried  to  build  a 
grammar  or  a  PDA  and  we  failed. That  doesn't  show  that  there  isn’t  some  other  way  to 
approach  the  problem. 

Instead,  we  will  again  approach  this  problem  from  the  other  direction.  We  will  artic¬ 
ulate  a  property  that  is  provably  true  of  all  context-free  languages.  Then,  if  we  can 
show  that  a  language  L  does  not  possess  this  property,  then  we  know  that  L  is  not  con¬ 
text-free.  So.  just  as  we  did  when  we  used  the  Pumping  Theorem  for  regular  languages, 
we  will  construct  proofs  by  contradiction.  We  will  say,  “If  L  were  context-free,  then  it 
would  possess  certain  properties.  But  it  does  not  possess  those  properties. Therefore,  it 
is  not  context-free.” 

This  time  we  exploit  the  fact  that  every  context-free  language  is  generated  by  some 
context-free  grammar.  The  argument  we  are  about  to  make  is  based  on  the  structure  of 
parse  trees.  Recall  that  a  parse  tree. derived  by  a  grammar  G  =  (V,  2,  R.  S ).  is  a  rooted, 
ordered  tree  in  which: 

•  Every  leal'  node  is  labeled  with  an  element  of  2  U  {e}, 

•  The  root  node  is  labeled  S, 

•  Every  other  node  is  labeled  with  some  element  of  V  -  and 

•  If  m  is  a  nonleaf  node  labeled  X  and  the  children  of  m  are  labeled  .v,,  x2, . . . .  .v„, 
then  the  rule  X  —*  jr|jc2 ....  xn  is  in  R. 

Consider  an  arbitrary  parse  tree,  as  shown  in  Figure  13.1  The  height  of  a  tree  is  the 
length  of  the  longest  path  from  the  root  to  any  leaf.  The  branching  factor  of  a  tree  is 
the  largest  number  of  daughters  of  any  node  in  the  tree.  The  yield  of  a  tree  is  the  or¬ 
dered  sequence  of  its  leaf  nodes. 


THEOREM  13.3  The  Height  of  A  Tree  and  its  Branching  Factor  Put  A 
Bound  On  its  Yield 


Theorem:  The  length  of  the  yield  of  any  tree  T  with  height  h  and  branching  factor 
h  is  </A 
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Proof:  The  proof  is  by  induction  on  h.  If  /r  is  1 .  then  just  a  single  rule  applies.  So  the 
longest  yield  is  of  length  less  than  or  equal  to  h.  Assume  the  claim  is  true  for 
h  =  n.  We  show  that  it  is  true  for  h  —  n  +  1.  Consider  any  tree  with  h  =  n  +  1* 

It  consists  of  a  root,  and  some  number  of  subtrees,  each  of  which  is  of  height  ^n. 
By  the  induction  hypothesis,  the  length  of  the  yield  of  each  of  those  subtrees  is 
^ b The  number  of  subtrees  of  the  root  is  s^b.  So  the  length  of  the  yield  must  be 
(b")  =  ^+l  =  A* 

Let  G  =  [V,  2,  /?.  S)  be  a  context-free  grammar.  Let  n  =  | V  -  X|  be  the  number 
of  nonterminal  symbols  in  G.  Let  b  be  the  branching  factor  of  G,  defined  to  be  the 
length  of  the  longest  right-hand  side  of  any  rule  in  R. 

Now  consider  any  parse  tree  T  generated  by  G.  Suppose  that  no  nonterminal  ap¬ 
pears  more  than  once  on  any  one  path  from  the  root  of  T  to  a  nonterminal.  Then  the 
height  of  T  is  rs/i.  So  the  longest  string  that  could  correspond  to  the  yield  of  T  has 
length  ^ bn . 

Now  suppose  that  w  is  a  string  in  L{G)  and  |m.*|  >  h".  Then  any  parse  tree  that  G 
generates  for  w  must  contain  at  least  one  path  that  contains  at  least  one  repeated  non¬ 
terminal.  Another  way  to  think  of  this  is  that,  to  derive  i<>  ,  G  must  have  used  at  least 
one  recursive  rule.  So  any  parse  tree  for  w  must  look  like  the  one  shown  in  Figure  13.2, 
where  X  is  some  repealed  nonterminal.  We  use  dolled  lines  to  make  it  clear  that  the 
derivation  may  not  be  direct  but  may,  instead,  require  several  steps.  So.  for  example.it 
is  possible  that  the  tree  shown  here  was  derived  using  a  grammar  that  contained  the 
rules  X~*  aVT),  Y— *  bA'a.andA'-*  ab. 

Of  course,  it  is  possible  that  w  has  more  than  one  parse  tree.  For  the  rest  of  this  dis¬ 
cussion  we  will  pick  some  tree  such  that  G  generates  no  other  parse  tree  for  w  that  has 
fewer  nodes.  Within  that  tree  it  is  possible  that  there  arc  many  repeated  nonterminals 
and  that  some  of  them  are  repeated  more  than  once.  We  will  assume  only  that  we  have 
chosen  point  [1]  in  the  tree  such  that  X  is  the  first  repealed  nonterminal  on  any  path, 
coming  up  from  the  bottom,  in  the  subtree  rooted  at  [1 ).  We'll  call  the  rule  that  was  ap¬ 
plied  at  [1]  rule |  and  the  rule  that  was  applied  at  (2]  rule:. 

We  can  sketch  the  derivation  that  produced  this  tree  as: 

S  =**  uXz  =>*  uvXyz  uvxyz. 


S 


FIGURE  13.2  A  parse  tree  whose 
height  is  greater  than  n. 


u: 
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So  we  have  carved  w  up  into  five  pieces: m,  v,  x,y,  and  z.  We  observe  that: 

•  There  is  another  derivation  in  G,  S  =>*  uXz  =>*  uxz ,  in  which,  at  the  point  labeled 
[1],  the  nonrecursive  rule 2  is  used.  So  uxz  is  also  in  L(G). 

•  There  are  infinitely  many  derivations  in  G,  such  as  S=>*  uXz  =>*  uvXyz^>* 
uvvXyyz=>*  uvvxyyz ,  in  which  the  recursive  rule\  is  applied  one  or  more  addi¬ 
tional  times  before  the  nonrecursive  rule2  is  used.  Those  derivations  produce  the 
strings,  uvi2xy2z,  uv3xy*z,  etc.  So  all  of  those  strings  are  also  in  L(G). 

•  It  is  possible  that  v  =  e,  as  it  would  be,  for  example  if  rule i  were  X  —*■  Xa.  It  is  also  pos¬ 
sible  that  y  =  e,  as  it  would  be,  for  example  if  rule\  were  X  —*  aX.  But  it  is  not  possible 
that  both  v  and  y  are  e.  If  they  were,  then  the  derivation  S  =>*  uXz  =>*  uxz  would  also 
yield  w  and  it  would  create  a  parse  tree  with  fewer  nodes  But  that  contradicts  the  as¬ 
sumption  that  we  started  with  a  tree  with  the  smallest  possible  number  of  nodes. 

•  The  height  of  the  subtree  rooted  at  [1]  is  at  most  n  +  1  (since  there  is  one  repeated 
nonterminal  and  every  other  nonterminal  can  occur  no  more  than  once).  So 
| wry |  <  b"+l. 

These  observations  are  the  basis  for  the  context-free  Pumping  Theorem,  which  we 

state  next. 

THEOREM  13.4  The  Pumping  Theorem  for  Context-Free  Languages 

Theorem:  If  L  is  a  context-free  language,  then: 

3k  st  1  (V  strings  weL,  where  |iu|  >  k  (3m,  v,  x,  y,  z 

(w  =  uvxyz, 
vy  *  e, 

|wcy|  s  k,  and 

Vg  s  0  (uvqxyqz  is  in  L)))). 

Proof:  The  proof  is  the  argument  that  we  gave  above:  If  L  is  context-free,  then  it  is  gen¬ 
erated  by  some  context-free  grammar  G  =  (V,Y.,R,  S)  with  n  nonterminal  symbols 
and  branching  factor  b.  Let  k  be  b',+l.  Any  string  that  can  be  generated  by  G  and 
whose  parse  tree  contains  no  paths  with  repeated  nonterminals  must  have  length  less 
than  or  equal  to  bn.  Assuming  that  b  ^  2,  it  must  be  the  case  that  bn+x  >  bn.  So  let 
w  be  any  string  in  L(G)  where  M  >  k.  Let  T  be  any  smallest  parse  tree  for  w  (i.e., 
a  parse  tree  such  that  no  other  parse  tree  for  w  has  fewer  nodes).  T  must  have  height 
at  least  n  +  1.  Choose  some  path  in  T  of  length  at  least  n  +  1.  Let  A- be  the  bottom¬ 
most  repeated  nonterminal  along  that  path.  Then  u;  can  be  rewritten  as  uvxyz  as 
shown  in  the  tree  diagram  of  Figure  13.2.  The  tree  rooted  at  [1]  has  height  at  most 
n  +  1.  Thus  its  yield,  v.vy,  has  length  less  than  or  equal  to  b',+\  which  is  k.  Further, 
vy  &  e  since  if  vy  were  e  then  there  would  be  a  smaller  parse  tree  for  w  and  we  chose 
T  so  that  that  wasn  t  so.  Finally,  v  and  y  can  be  pumped:  uxz  must  be  in  L  because 
rule2  could  have  been  used  immediately  at  [l].  And,  for  any  q  ss  1,  uvqxyqz  must  be 
in  L  because  rule\  could  have  been  used  q  times  before  finally  using  rule 2. 
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So,  if  L  is  a  context-free  language,  every  “long"  siring  in  /.  must  he  pumpable.  Just  as 
with  the  Pumping  Theorem  for  regular  languages,  the  pumped  region  can  be  pumped 
out  once  or  pumped  in  any  number  of  limes,  in  all  eases  resulting  in  another  string  that 
is  also  in  /..  So.  if  there  is  even  one  “long"  siring  in  I.  that  is  not  pumpable.  then  L  is  not 
context-free. 

Note  that  the  value  k  plays  two  roles  in  the  Pumping  Theorem.  It  defines  what  we 
mean  by  a  “long"  string  and  it  imposes  an  upper  bound  on  |r.vv|.  When  we  set  k  to 
fr"" '.  we  guaranteed  that  it  was  large  enough  so  that  we  could  prove  that  it  served  both 
of  those  purposes.  Hut  we  should  point  out  that  a  smaller  value  would  have  sufficed  as 
the  definition  for  a  “long"  siring,  since  any  siring  of  length  greater  than  b"  must  be 
pumpable. 

There  are  a  few  important  ways  in  which  the  context-free  Pumping  Theorem  differs 
from  the  regular  one: 

■  The  most  obvious  is  that  two  regions,  r  and  y.  must  lx-  pumped  in  tandem. 

•  We  don't  know  anything  about  where  the  strings  r  and  y  will  fall.  All  we  know  is 
that  they  are  reasonably  “close  together",  i.e..  |r.ry|  ^  k. 

•  Either  v  or  v  could  be  empty,  although  not  both. 


EXAMPLE  13.1  AnBnCn  is  Not  Context-Free 


Let  L  =  AnBnCn  =  {a’Vc'* :  n  ^  ()}.  We  can  use  the  Pumping  Theorem  to  show 
that  L  is  not  context-free.  If  it  were,  then  there  would  exist  some  k  such  that  any 
string  w,  where  |w|  s  k.  must  satisfy  the  conditions  of  the  theorem.  We  show  one 
string  w  that  does  not.  Let  w  =  aAb*c*.  where  k  is  the  constant  from  the  Pump¬ 
ing  Theorem.  For  w  to  satisfy  the  conditions  of  the  Pumping  Theorem,  there  must 
be  some  m,  v,  x,  y.  and  z  such  that  w  =  uvxyz.  vy  *  e.  \vxy\  s  k.  and  Vc/  a  0 
(uvpxy',z  is  in  L).  We  show  that  no  such  u.  v.x.y.  and  -  exist.  If  either  »>  or  y  con¬ 
tains  two  or  more  different  characters,  then  set  q  to  2  (i.e..  pump  in  once)  and  the 
resulting  string  will  have  letters  out  of  order  and  thus  not  he  in  AMBrCn.  (For  ex¬ 
ample,  if  v  is  aabb  and  y  is  cc,  then  the  string  that  results  from  pumping  will  look 
like  aaa . . .  aaabbaabbccc . . .  ccc.)  If  both  v  and  y  each  contain  at  most  one  dis¬ 
tinct  character  then  set  q  to  2.  Additional  copies  of  at  most  two  different  charac¬ 
ters  are  added,  leaving  the  third  unchanged.  There  are  no  longer  equal  numbers 
of  the  three  letters,  so  the  resulting  string  is  not  in  A"B"C'\  There  is  no  way  to  di¬ 
vide  w  into  uvxyz  such  that  all  the  conditions  of  the  Pumping  Theorem  are  met. 
So  A"BnCn  is  not  context-free. 


As  with  the  Pumping  Theorem  for  regular  languages,  it  requires  some  skill  to  design 
simple  and  effective  pnxrfs  using  the  context-free  Pumping  Iheorem.  As  before,  the 
choices  that  we  can  make,  when  trying  to  show  that  a  language  I.  is  not  context-free  are: 

•  We  choose  w,  the  string  to  he  pumped.  It  is  important  to  choose  «•  so  that  it  is  in  the 
part  of  L  that  captures  the  essence  of  why  L  is  not  context-free. 

•  We  choose  a  value  for  q  that  shows  that  ir  isn't  pumpable. 


13.3  The  Pumping  Theorem  for  Context-Free  Languages  285 


•  We  may  apply  closure  theorems  before  we  start,  so  that  we  show  that  L  is  not  con¬ 
text-free  by  showing  that  some  other  language  L'  isn't.  We’ll  have  more  to  say 
about  this  technique  later. 


EXAMPLE  13.2  The  Language  of  Strings  with  n2  a's  is  Not  Context-Free 

* 

Let  L  =  {a"  :n  2  0}.  We  can  use  the  Pumping  Theorem  to  show  that  L  is  not 
context-free.  If  it  were,  then  there  would  exist  some  A  such  that  any  siring  w, 
where  |w|  2  A,  must  satisfy  the  conditions  of  the  theorem.  We  show  one  strings 
that  does  not.  Let  n  (in  the  definition  of  L)  be  A2.  So  n2  =  A4  and  w  =  ak  .  For  w 
to  satisfy  the  conditions  of  the  Pumping  Theorem,  there  must  be  some  //,  v,x,  y, 
and  z,  such  that  w  =  uvxyz.vy  *  e.  |u.vy|  ^  A,  andV<y  2  0  (utflxyqz  is  in  L).We 
show  that  no  such  u,  v ,  x,  y,  and  z  exist.  Since  w  contains  only  a’s,  vy  =  a'\  for 
some  nonzero  p.  Set  q  to  2.  The  resulting  string,  which  we’ll  call  s.  is  ak  +/\  which 
must  be  in  L.  But  it  isn’t  because  it  is  too  short.  If  ak\  which  contains  ( k 2)2  a’s,  is  in 
L,  then  the  next  longer  element  of  L  contains  (A2  +  l)2  a’s.  That’s  A4  +  2k2  +  1 
a’s.  So  there  are  no  strings  in  L  with  length  between  kA  and  kA  +  2k2  +  1.  But 
Ul  =  A4  +  p.  So,  for  s  to  be  in  L.p  =  |uy|  would  have  to  be  at  least  2k2  +  1.  But 
|my|  ^  A',  so  p  can’t  be  that  large.  Thus  s  is  not  in  L.  There  is  no  way  to  divide  w 
into  uvxvz  such  that  all  the  conditions  of  the  Pumping  Theorem  are  met.  So  L  is 
not  context-free. 


When  using  the  Pumping  Theorem,  we  focus  on  v  and  y.  Once  they  are  specified,  so 
are  u.x,  and  z. 

To  show  that  there  exists  no  v,  y  pair  that  satisfies  all  of  the  conditions  of  the  Pump¬ 
ing  Theorem,  it  is  sometimes  necessary  to  enumerate  a  set  of  cases  and  rule  them  out 
one  at  a  time.  Sometimes  the  easiest  way  to  do  this  is  to  imagine  the  string  to  be 
pumped  as  divided  into  a  set  of  regions.  Then  we  can  consider  all  the  ways  in  which  v 
and  y  can  fall  across  those  regions. 

EXAMPLE  13.3  Dividing  the  String  w  Into  Regions 

Let  L  =  {a"b"'a"  :  n,  m  ^  0  and  n  ^  m}.  We  can  use  the  Pumping  Theorem  to 
show  that  L  is  not  context-free.  If  it  were,  then  there  would  exist  some  k  such  that 
any  string  «\  where  \w\  s  A,  must  satisfy  the  conditions  of  the  theorem.  We  show 
one  siring  w  that  does  not.  Let  w  =  a*b*a*,  where  k  is  the  constant  from  the 
PumpingThcorem.  For  w  to  satisfy  the  conditions  of  the  Pumping^ Theorem,  there 
must  be  some  u,  v,  x.  y,  and  z,  such  that  w  =  uvxyz,  vy  /  e,  |uxy|  s  A,  and 
V</  2  0  {uv'lxy'iz  is  in  L).  We  show  that  no  such  u,  v,  x ,  y.  and  z  exist.  Imagine  w  di¬ 
vided  into  three  regions  as  follows: 

aaa  . . .  aaabbb  . . .  bbbaaa  . . . aaa 

I  1  I  2  I  3  | 
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EXAMPLE  13.3  (Continued) 

We  consider  all  ihe  cases  for  where  v  and  y  could  fall  and  show  lhat  in  none  of 
them  are  all  the  conditions  of  the  theorem  met: 

•  If  either  v  or  y  crosses  regions,  then  set  </  to  2  (thus  pumping  in  once).  The  re¬ 
sulting  string  will  have  letters  out  of  order  and  so  not  he  in  L.  So  in  all  the  re¬ 
maining  cases  we  assume  that  v  and  y  each  falls  w  ithin  a  single  region. 

•  (1,1):  Both  v  and  y  fall  in  region  1.  Set  q  to  2.  In  the  resulting  string,  the  first 
group  of  a’s  is  longer  than  the  second  group  of  a  s.  So  the  string  is  not  in  L. 

•  (2. 2):  Both  v  and  y  fall  in  region  2.  Set  q  to  2.  In  the  resulting  string,  the  b  re¬ 
gion  is  longer  than  either  of  the  a  regions.  So  the  siring  is  not  in  /. 

•  (3, 3):  Both  v  and  y  fall  in  region  3.  Set  q  to  0. The  same  argument  as  for  (1, 1). 

•  (1,2):  Nonempty  v  falls  in  region  1  and  nonempty  y  falls  in  region  2.  (If  either 
v  or  y  is  empty,  it  does  not  matter  where  it  falls.  So  we  can  treat  it  as  though  it 
falls  in  the  same  region  as  the  nonempty  one.  We  have  already  considered  all 
of  those  cases.)  Set  q  to  2.  In  the  resulting  string,  the  first  group  of  a’s  is  longer 
than  the  second  group  of  a's.  So  the  siring  is  not  in  L. 

•  (2, 3):  Nonempty  v  falls  in  region  2  and  nonempty  y  falls  in  region  3.  Set  q  to  2. 
In  the  resulting  string  the  second  group  of  a's  is  longer  than  the  first  group  of 
a’s.  So  the  string  is  not  in  L. 

•  (1, 3):  Nonempty  v  falls  in  region  1  and  nonempty  y  falls  in  region  3.  If  this 
were  allowed  by  the  other  conditions  of  the  Pumping  Theorem,  we  could 
pump  in  a’s  and  still  produce  strings  in  L.  But  if  we  pumped  out.  we  would  vi¬ 
olate  the  requirement  that  the  a  regions  be  at  least  as  long  as  the  b  region. 
More  importantly,  this  case  violates  the  requirement  that  |  t’.ryl  <  k.  So  it 
need  not  be  considered. 

There  is  no  way  to  divide  w  into  uvxyz  such  that  all  the  conditions  of  the 
Pumping  Theorem  are  met.  So  L  is  not  context-free. 


Consider  the  language  PalEven  =  \mrR:  ire  (a.  b}*),  the  language  of  even- 
length  palindromes  of  a’s  and  b’s,  which  we  introduced  in  Example  1 1 .3.  Let  w  be  any 
string  in  PalEven.  Then  substrings  of  ir  are  related  to  each  other  in  a  perfectly  nested 
way,  as  shown  in  Figure  13.3  (a).  Nested  relationships  of  this  sort  can  naturally  be  de¬ 
scribed  with  a  context-free  grammar,  so  languages  whose  strings  are  structured  in  this 
way  are  typically  context-free. 

But  now  consider  the  case  in  which  the  relationships  arc  not  properly  nested  but  in¬ 
stead  cross.  For  example,  consider  the  language  WcW  =»  [ircir.ire  (a. b)*}.  Now  let 
«’  be  any  string  in  WcW.  Then  substrings  of  ir  are  related  to  each  other  as  shown  in 
Figure  13.3  (b).  We  call  such  dependencies,  where  lines  cross  each  other,  cross-serial 
dependencies.  Languages  whose  strings  are  characterized  by  cross  serial  dependencies 
are  typically  not  context-free. 
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a  a  b  b  a  a 
1 

(n) 

a  a  b  c  a  a  b 


FIGURE  13.3  Nested  versus 
(h)  cross-serial  dependencies. 

EXAMPLE  13.4  WcW  is  Not  Context-Free 

Let  WcW  =  {wcw :  we  {a,  b}*}.  WcW  is  not  context-free.  All  its  nonempty 
strings  contain  cross-serial  dependencies. 

We  can  use  the  Pumping  Theorem  to  show  that  WcW  is  not  context-free.  If  it 
were,  then  there  would  exist  some  k  such  that  any  string  w\  where  |w|  ^  fc,  must 
satisfy  the  conditions  of  the  theorem.  We  show  one  string  w  that  does  not.  Let  w  = 
a*bA  ca*b\  where  k  is  the  constant  from  the  Pumping  Theorem.  For  w  to  satisfy  the 
conditions  of  the  Pumping  Theorem,  there  must  be  some  u,  v.  .v,  y,  and  z,  such  that 
w  =  uvxyz .  vy  *  e.  |in*y|  ^  k ,  and  Vry  ^  0  ( uv<lxyqz  is  in  WcW).  We  show  that 
no  such  Us  v,x,y,  and  z  exist.  Imagine  w  divided  into  Five  regions  as  follows: 

aaa  . . .  aaabbb  . . .  bbbcaaa  . . .  aaabbb  . . .  bbb 

l  1  I  2  |3|  4  I  5  | 

Call  the  part  before  the  c  the  left  side  and  the  part  after  the  c  the  right  side.  We 
consider  all  the  cases  for  where  v  and  y  could  fall  and  show  that  in  none  of  them 
are  all  the  conditions  of  the  theorem  met: 

•  If  either  v  or  y  overlaps  region  3,  set  q  to  0.  The  resulting  string  will  no  longer 
contain  a  c  and  so  is  not  in  WcW. 

•  If  both  v  and  y  occur  before  region  3  or  they  both  occur  after  region  3,  then  set 
t/  to  2.  One  side  will  be  longer  than  the  other  and  so  the  resulting  string  is  not 
in  WcW. 

•  If  either  v  or  y  overlaps  region  1 ,  then  set  q  to  2.  In  order  to  make  the  right  side 
match,  something  would  have  to  be  pumped  into  region  4.  But  any  t>,  y  pair 
that  did  that  would  violate  the  requirement  that  |ujry|  <  k. 

•  If  either  v  or  y  overlaps  region  2,  then  set  q  to  2.  In  order  to  make  the  right 
side  match,  something  would  have  to  be  pumped  into  region  5.  But  any  v,  y 
pair  that  did  that  would  violate  the  requirement  that  liuyl  s  k. 

There  is  no  way  to  divide  w  into  uvxyz  such  that  all  the  conditions  of  the 
Pumping  Theorem  are  met.  So  WcW  is  not  context-free. 


Are  programming  languages  like  C++  and  Java  context-free?  (G.2) 
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The  language  WcW.  which  we  just  showed  is  not  context-free,  is  important  because  of 
its  similarity  to  the  structure  of  many  common  programming  languages.  Consider  a  pro¬ 
gramming  language  that  requires  that  variables  be  declared  before  they  are  used.  If  we 
consider  just  a  single  variable  ir  .  then  a  program  that  declares  ?r  and  then  uses  it  has  a 
structure  very  similar  to  the  strings  in  the  language  WcW.  since  the  string  ir  must  occur 
in  exactly  the  same  form  in  both  the  declaration  section  and  the  body  of  the  program. 

13.4  Some  Important  Closure  Properties  of  Context-Free 
Languages 

It  helps  to  be  able  to  analyze  a  complex  language  by  decomposing  it  into  simpler 
pieces.  Closure  theorems,  when  they  exist,  enable  us  to  do  that.  We’ll  see  in  this  sec¬ 
tion  that,  while  the  context-free  languages  are  closed  under  some  common  operations, 
we  cannot  prove  as  strong  a  set  of  closure  theorems  as  we  were  able  to  prove  for  the 
regular  languages. 

13.4.1  The  Closure  Theorems 

THEOREM  13.5  Closure  Under  Union,  Concatenation,  Kleene  Star,  Reverse, 
and  Letter  Substitution 

Theorem:  The  context-free  languages  are  closed  under  union,  concatenation. 

Kleene  star,  reverse,  and  letter  substitution. 

Proof:  We  prove  each  of  the  claims  separately  by  construction: 

•  The  context-free  languages  are  closed  under  union:  If  Lt  and  Li  are  context- 

free  languages,  then  there  exist  context-free  grammars  O',  =  (V,.  R\,S\) 

and  G2  =  (Vi.  Si,  R2.  S2)  such  that  L\  =  IAG\)  and  Li  =  /.(CL).  If  necessary, 
rename  the  nonterminals  of  G,  and  G2  so  that  the  two  sets  are  disjoint  and  so 
that  neither  includes  the  symbol  S.  We  will  build  a  new  grammar  G  such  that 
L(G)  =  L(G |)U  L  (G’i).  G  will  contain  all  the  rules  of  both  G\  and  G2.  We 
add  to  G  a  new  start  symbol.  S.  and  two  new  rules.  S  -*  .S|  and  S  — *  S2.  The  two 
new  rules  allow  G  to  generate  a  siring  iff  at  least  one  of  G’,  or  CL  generates  it. 
So  G  *  ( V,  U  V2  U  {5 } .  £ ,  U  2*  K,  U  R:  U  {5  -*  .V,.  S  —  S2  \ .  S). 

•  The  context-free  languages  are  closed  under  concatenation:  If  L\  and  L2  are 
context-free  languages,  then  there  exist  context-free  grammars  Gx  —  (Vj, 
1|,  R\,  S|)  and  CL  =  (Vi,  i,i.  R2.S2)  such  that  L |  —  l.[G\)  and  Li  =  L  (Gi). 
If  necessary,  rename  the  nonterminals  of  G,  and  CL  so  that  the  two  sets  are  dis¬ 
joint  and  so  that  neither  includes  the  symbol  S.  We  w  ill  build  a  new  grammar  G 
such  that  L(G)  *  L(CL)  L  (GL).  G  will  contain  all  the  rules  of  both  G,  and  Gi. 
We  add  to  G  a  new  start  symbol,  S.  and  one  new  rule.  S  — »  .S’,. Si.  So  G  =  (Vj 
U  V2  U  {5},  2,  U  2,.  /?,  U  R2  U  { .S  —  S,  Si }.  S). 

•  The  context-free  languages  are  closed  under  Kleene  star:  If  L\  is  a  context- 
free  language,  then  there  exists  a  context-free  grammar  G’,  =  ( Vj.  /?,,  5,) 
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such  ihul  L\  =  L(G|).  If  necessary,  rename  ihe  nonterminals  of  G)  so  that  Vx 
does  not  include  the  symbol  S.  We  will  build  a  new  grammar  G  such  that 
L  ( G )  =  L(G))*.  G  wiil  contain  all  the  rules  of  Gt.  We  add  to  G  a  new  start 
symbol,  S.  and  two  new  rules,  S  — ►  e  and  S  -*  S  Sx.  So  G  =  (l^i  U  {S},  2j, 

•  The  context-free  languages  are  closed  under  reverse:  Recall  that  LR  = 
{ire  2*  :  w  =  *R  for  some  xe  L).  If  L  is  a  context-free  language,  then  it  is 
generated  by  some  Chomsky  normal  form  grammar  G  =  (V.  2,  R,  S).  Every 
rule  in  G  is  of  the  form  X  —*  BC  or  X—*a,  where  X,  B,  and  C  are  elements  of 
V  -  2  and  a  e  2.  In  the  latter  case  L  (X)  =  {a}.  {</}R  =  {a}.  In  the  former 
case,  L(X)  =  L(fl)L(C).  By  Theorem  2.4,  (L(B)L(C))R  =  L{C)RL(B)R.  So 
we  construct,  from  G.  a  new  grammar  G\  such  that  L(G')  =  LR.  G'  = 

( V(!,  1G.  R\  Sa),  where  R'  is  constructed  as  follows: 

•  For  every  rule  in  G  of  the  form  X  —*■  BC ,  add  to  R'  the  rule  X  — >  CB. 

•  For  every  rule  in  G  of  the  form  X—>a,  add  to  R'  the  rule  X-*a. 

•  The  context-free  languages  are  closed  under  letter  substitution,  defined 
as  follows:  Consider  any  two  alphabets,  and  22.  Let  suh  be  any  func¬ 
tion  from  2)  to  22*.  Then  letsub  is  a  letter  substitution  function  from  L\ 
to  L2  iff  letsub  (Lx)  =  {me  22*  •  By  e  Lx(w=y  except  that  every  character 
c  of  y  has  been  replaced  by  sub  (c))).  We  leave  the  proof  of  this  as  an  ex¬ 
ercise. 

As  with  regular  languages,  we  can  use  these  closure  theorems  as  a  way  to  prove  that 
a  more  complex  language  is  context-free  if  it  can  be  shown  to  be  built  from  simpler 
ones  using  operations  under  which  the  context-free  languages  are  closed. 

THEOREM  13.6  Nonclosure  Under  Intersection,  Complement,  and  Difference 

Theorem:  The  context-free  languages  are  not  closed  under  intersection,  comple¬ 
ment,  or  difference. 

Proof: 

•  The  context-free  languages  are  not  closed  under  intersection:  The  proof  is  by 
counterexample.  Let: 

L\  =  { a''b"c"'  :  n.m  >0}.  /*  equal  a’s  and  b*s. 

L2  =  {ambV :  n.m  ^0).  I*  equal  b's  and  c’s. 

Both  L\  and  L2  arc  context-free  since  there  exist  straightforward  context-free 
grammars  for  them. 

But  now  consider: 


L-  —  Z-i  n  L-> 
=  {a'Vc" 


:  n  >  0} . 
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If  the  context-free  languages  were  closed  under  intersection,  1.  would  have  to 
be  context-free.  But  we  proved,  in  Example  13.1. that  it  isn't. 

The  context-free  languages  are  not  closed  under  complement:  Given  any  sets 
L\  and  /.i, 

L\  0  Li  —  U  — «/-2). 

The  context-free  languages  are  closed  under  union.  So.  if  they  were  also  closed 
under  complement,  they  would  necessarily  be  closed  under  intersection.  But  we 
just  showed  that  they  are  not.Thus  they  are  not  closed  under  complement  either. 
We’ve  also  seen  an  example  that  proves  this  claim  directly.  -.A"B"C  is  con¬ 
text-free.  We  showed  a  PDA  that  accepts  it  in  Example  12.S.  But  -»(-iA"B“Cn) 
=  AnB"C"  is  not  context-free. 

The  context-free  languages  are  not  closed  under  difference  (subtraction): 
Given  any  language  L , 

=  v*  _  L 

2*  is  context-free.  So,  if  the  context-free  languages  were  closed  under  dif¬ 
ference.  the  complement  of  any  context-free  language  would  necessarily  be 
context-free.  But  we  just  showed  that  that  is  not  so. 


Recall  that,  in  using  the  regular  Pumping  Theorem  to  show  that  some  language  L 
was  not  regular,  we  sometimes  found  it  useful  to  begin  by  intersecting  L  with  another 
regular  language  to  create  a  new  language  L‘ .  Since  the  regular  languages  are  closed 
under  intersection.  L‘  would  necessarily  be  regular  if  L  were.  We  then  showed  that  L', 
designed  to  be  simpler  to  work  with,  was  not  regular.  And  so  neither  was  L. 

It  would  be  very  useful  to  be  able  to  exploit  this  technique  when  using  the  context- 
free  Pumping  Theorem.  Unfortunately,  as  we  have  just  shown,  the  context-free  lan¬ 
guages  are  not  closed  under  intersection.  Fortunately,  however,  they  are  closed  under 
intersection  with  the  regular  languages.  We’ll  prove  this  result  next  and  then,  in  Section 
13.4.2.  we’ll  show  how  it  can  be  exploited  in  a  proof  that  a  language  is  not  context-free. 


THEOREM  13.7  Closure  Under  Intersection  With  the  Regular  Languages 

Theorem:  The  context-free  languages  are  closed  under  intersection  with  the  regular 
languages. 

Proof:  The  proof  is  by  construction.  If  L\  is  context-free,  then  there  exists  some 
PDA  =  (Ki%  2,JT|,  A,, 5,,  A))  that  accepts  it.  If  Li  is  regular  then  there  exists 
a  DFSM  Mi  =  (AG,  Z.8.si.A2)  that  accepts  it.  We  construct  a  new  PDA,  M3 
that  accepts  O  Li.  M3  will  work  by  simulating  the  parallel  execution  of  Mt 
and  Mi.  The  states  of  M3  will  be  ordered  pairs  of  states  of  M(  and  Mi.  As  each 
input  character  is  read.  M?  will  simulate  both  M,  and  Mi  moving  appropriately 
to  a  new  state.  M3  will  have  a  single  slack,  which  will  be  controlled  by  M\.  The 
only  slightly  tricky  thing  is  that  M\  may  contain  ^-transitions.  So  Ms  will  have  to 
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allow  M|  lo  follow  them  while  M2  just  stays  in  the  same  state  and  waits  until  the 
next  input  symbol  is  read. 

My  =  (JC]  X  K2 ,  2,  rb  A3,  (sj,  s2),  Ai  X  A2 ),  where  A3  is  built  as  follows: 

•  For  each  transition  ((^i,  a,  p),  (pu  y))inA|, 

and  each  transition  (( q2 ,  a  ),  p2  )  in  8,  add  to  A3  the  transition: 

a*P)A<J>i'P2)’y)Y 

•  For  each  transition  e,  fi),  (pu  -y)  in  Ab 

and  each  state  q2  in  K2,  add  to  A3  the  transition: 

(((<7i.<?2).e,  j3),  ({p\,q2),y)). 

We  define  intersectPDAandFSM  as  follows: 

intersectPDAandFSM  (M\:  PDA,  M2.  FSM)  = 

Build  My  as  defined  in  the  proof  of  Theorem  13.7. 

THEOREM  13.8  Closure  Under  Difference  with  the  Regular  Languages 

Theorem:  The  difference  ( L\  —  L2)  between  a  context-free  language  L\  and  a  reg¬ 
ular  language  L2  is  context-free. 

Proof:  L\  -  L2  =  L\C\  -> L2.  If  L2  is  regular,  then,  since  the  regular  languages  are 
closed  under  complement,  ->L2  is  also  regular.  Since  L\  is  context-free,  by  Theorem 
13.7,  L\  ft  ->L2  is  context-free. 


The  last  two  theorems  are  important  tools,  both  for  showing  that  a  language  is 
context-free  and  for  showing  that  a  language  is  not  context-free. 


EXAMPLE  13.5  Using  Closure  Theorems  to  Prove  A  Language 
Context-Free 

Consider  the  perhaps  contrived  language  L  =  {a"bw  :  n  &  0  and  n  *  1776}.  An¬ 
other  way  to  describe  L  is  that  it  is  s  0}  —  {al776b1776}.  AnBn  = 

{a"b"  :n^0)  is  context-free.  We  have  shown  both  a  simple  grammar  that  gener¬ 
ates  it  and  a  simple  PDA  that  accepts  it.  {a1776b1776}  is  finite  and  thus  regular.  So, 
by  Theorem  13.8,  L  is  context  free. 


Generalizing  that  example  a  bit,  from  Theorem  13.8  it  follows  that  any  language  that 
can  be  described  as  the  result  of  subtracting  a  finite  number  of  elements  from  some 
language  known  to  be  context-free  must  also  be  context-free. 
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13.4.2  Using  the  Pumping  Theorem  in  Conjunction  with  the 
Closure  Properties 

Languages  that  impose  no  specific  order  constraints  on  the  symbols  contained  in  their 
strings  are  not  always  context-free.  But  it  may  be  hard  to  prove  that  one  isn’t  just  by 
using  the  Pumping  Theorem.  In  such  a  case,  it  is  often  useful  to  exploit  Theorem  13.7, 
which  tells  us  that  the  context-free  languages  arc  closed  under  intersection  with  the 
regular  languages. 

Recall  our  notational  convention  from  Section  13.3:  (//,«)  means  that  all  nonempty 
substrings  of  vy  occur  in  region  n.  This  may  happen  either  because  v  and  y  are  both 
nonempty  and  they  both  occur  in  region  n.  Or  it  may  happen  because  one  or  the  other 
is  empty  and  the  nonempty  one  occurs  in  region  n. 


Are  natural  languages  like  English  or  Chinese  or  German  context-free?  (L.3.3) 


EXAMPLE  13.6  WW  is  Not  Context-Free 

LetWW  =  {icir:w;e{a,b}*}.WWissimiJartoWcW  =  {tocw:  we  {a,b}*}, 
except  that  there  is  no  longer  a  middle  marker.  Because,  like  WcW,  it  contains 
cross-serial  dependencies,  it  is  not  context-free.  We  could  try  proving  that  by 
using  the  Pumping  Theorem  alone.  Here  are  some  attempts,  using  various  choic¬ 
es  for  w : 

•  Let  w  =  (ab)2*.  If  v  =  e  and  y  =  ab.  pumping  works  fine. 

•  Let  w  =  a*ba*b.  If  v  —  a  and  is  in  the  first  group  of  a’s  and  y  =  a  and  is  in 
the  second  group  of  a’s,  pumping  works  fine. 

•  Let  w  =  a*b*a*b*.  Now  the  constraint  that  |vjry|  ^  k  prevents  v  and  y  from 
both  being  in  the  two  a  regions  or  the  two  b  regions.  This  choice  of  ui  will  lead 
to  a  successful  Pumping  Theorem  proof.  But  there  are  four  regions  in  w  and 
we  must  consider  all  the  ways  in  which  v  and  v  could  overlap  those  regions,  in¬ 
cluding  all  those  in  which  either  or  both  of  v  and  y  occur  on  a  region  boundary. 
While  it  is  possible  to  write  out  all  those  possibilities  and  show,  one  at  a  time, 
that  every  one  of  them  violates  at  least  one  condition  of  the  Pumping  Theo¬ 
rem,  there  is  an  easier  way. 

If  WW  were  context-free,  then  L'  =  WW  H  a*b*a*b*  would  also  be  context- 
free.  But  it  isn’t,  which  we  can  show  using  the  Pumping  Theorem.  If  it  were,  then 
there  would  exist  some  k  such  that  any  siring  w\  where  |«»|  a  k ,  must  satisfy  the 
conditions  of  the  theorem.  We  show  one  string  w  that  does  not.  Let  w  =  a*b*a*bk, 
where  k  is  the  constant  from  the  Pumping  Theorem.  For  w  to  satisfy  the  conditions 
of  the  Pumping  Theorem,  there  must  be  some  u,  v.  x,  y,  and  z,  such  that 
w  =  uvxyz ,  vy  *  e,  |wy|  <  k ,  and  Vr/  =?  0  (urf'xy',z  is  in  L').  We  show  that  no 
such  u,  v,x,  y,  and  z  exist.  Imagine  w  divided  into  four  regions  as  follows: 
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Beta  •  •  ■  aaabbb  •  •  •  bbb3.3.a  •  •  •  aaabbb  •  •  •  bbb 
I  1  I  2  |  3  I  4  | 

We  consider  all  the  cases  for  where  v  and  y  could  fall  and  show  that  in  none  of 

them  are  all  the  conditions  of  the  theorem  met: 

•  If  either  v  or  y  overlaps  more  than  one  region,  set  q  to  2.  The  resulting  string 
will  not  be  in  a*b*a*b*  and  so  is  not  in  L'. 

•  If  |vy|  is  not  even  then  set  q  to  2.  The  resulting  string  will  have  odd  length  and 
so  not  be  in  L'.  We  assume  in  all  the  other  cases  that  |uy|  is  even. 

•  (1,1),  (2, 2),  (1,2):  Set  q  to  2.  The  boundary  between  the  first  half  and  the  sec¬ 
ond  half  will  shift  into  the  first  b  region.  So  the  second  half  will  start  with  a  b, 
while  the  first  half  still  starts  with  an  a.  So  the  resulting  string  is  not  in  L'. 

•  (3, 3),  (4, 4),  (3, 4):  Set  q  to  2.  This  time  the  boundary  shifts  into  the  second  a 
region.  The  first  half  will  end  with  an  a  while  the  second  half  still  ends  with  a  b. 
So  the  resulting  string  is  not  in  L'. 

•  (2, 3):  Set  q  to  2.  If  |v|  *  lyl  then  the  boundary  moves  and,  as  argued  above, 
the  resulting  string  is  not  in  L\  If  |u|  =  |y|  then  the  first  half  contains  more 
b’s  and  the  second  half  contains  more  a’s.  Since  they  are  no  longer  the  same, 
the  resulting  string  is  not  in  L'. 

•  (1,3),  (1,4),  and  (2,4)  violate  the  requirement  that  | vxy \  ^  k. 

There  is  no  way  to  divide  w  into  uvxyz  such  that  all  the  conditions  of  the 

Pumping  Theorem  are  met.  So  L'  is  not  context-free.  So  neither  is  WW. 


One  reason  that  context-free  grammars  are  typically  too  weak  to  describe 
musical  structures  is  that  they  cannot  describe  constraints  such  as  the  one 
that  defines  WW.  (N.l  .2) 


EXAMPLE  13.7  A  Simple  Arithmetic  Language  is  Not  Context-Free 

Let  L  -  {xty  =  z :  x,  y,  z  e  {0, 1}*  and,  if  jt,y  and  z  are  viewed  as  positive  binary 
numbers  without  leading  zeros,  then  xy  =  zR}.  For  example,  100#111  =  00111 
e  L.  (We  do  this  example  instead  of  the  more  natural  one  in  which  we  require  that 
xy  =  z  because  it  seems  as  though  it  might  be  more  likely  to  be  context-free.  As 
we’ll  see,  however,  even  this  simpler  variant  is  not.) 

If  L  were  context-free,  then  L‘  =  LIT10*#1*  =  0*1*  would  also  be  context-free. 
But  it  isn’t,  which  we  can  show  using  the  Pumping  Theorem.  If  it  were,  then  there 
would  exist  some  k  such  that  any  string  w,  where  \w\  >  k,  must  satisfy  the  conditions 
of  the  theorem.  We  show  one  string  in  that  does  not.  Let  to  =  10*  #1*  =  0*1*.  where  Ac  is 
the  constant  from  the  Pumping  Theorem.  Note  that  w  e  L  because  10*  •  1*  =  1*0*. 
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EXAMPLE  13.7  (Continued) 

For  w  to  satisfy  the  conditions  of  the  Pumping  Theorem,  there  must  be  some  «,t>, 
x,y.  and  z,  such  that  w  =  uvxyz ,  vy  *  e.  |my|  <  M,  and  Vc/  >  0  (uvqxyqz  is  in 
L).  We  show  that  no  such  u,  v,x>y,  and  z  exist.  Imagine  w  divided  into  seven  re¬ 
gions  as  follows: 

1000  ...  000 #111  ...  Ill  =  000  ...  000111  ...  Ill 
|1|  2  |3|  4  |5|  6  |  7  | 

We  consider  all  the  cases  for  where  v  and  y  could  fall  and  show  that  in  none  of 
them  are  all  the  conditions  of  the  theorem  met: 

•  If  either  v  or  y  overlaps  region  1 . 3,  or  5  then  set  q  to  O.The  resulting  string  will 
not  be  in  10*#1*  =  0*1*  and  so  is  not  in  L'. 

•  If  either  v  or  y  contains  the  boundary  between  6  and  7.  set  q  to  2.  The  resulting 
string  will  not  be  in  10*#1*  =  0*1*  and  so  is  not  in  L\  So  the  only  cases  left 
to  consider  are  those  where  v  and  y  each  occur  within  a  single  region. 

•  (2, 2),  (4, 4),  (2, 4):  Set  q  to  2.  Because  there  are  no  leading  zeros,  changing  the 
left  side  of  the  string  changes  its  value.  But  the  right  side  doesn’t  change  to 
match.  So  the  resulting  string  is  not  in  L'. 

•  (6, 6),  (7, 7),  (6. 7):  Set  q  to  2.  The  right  side  of  the  equality  statement  changes 
value  but  the  left  side  doesn’t.  So  the  resulting  siring  is  not  in  L'. 

•  (4, 6):  Note  that,  because  of  the  first  argument  to  the  multiplication,  the  num¬ 
ber  of  l’s  in  the  second  argument  must  equal  the  number  of  l’s  after 
the  =.  Set  q  to  2.  The  number  of  l’s  in  the  second  argument  changed  but  the 
number  of  l’s  in  the  result  did  not.  So  the  resulting  string  is  not  in  L'. 

•  (2, 6),  (2, 7),  and  (4, 7)  violate  the  requirement  that  |  vxy\  ^  k. 

There  is  no  way  to  divide  w  into  uvxyz  such  that  all  the  conditions  of  the 
Pumping  Theorem  are  met.  So  L  is  not  context-free. 


Sometimes  the  closure  theorems  can  be  used  to  reduce  the  proof  that  a  new  lan¬ 
guage  L  is  not  context-free  to  the  proof  that  some  other  language  /.’  is  not  context- 
free,  where  we  have  already  proven  the  case  for 


EXAMPLE  13.8  Using  Intersection  to  Force  Order  Constraints 

Let  L  =  {we  {a,  b,  c}*  :  #a(w)  =  #b(«!)  =  #c(«0}-  If  L  were  context-free,  then 
L'  =  LD  a*b*c*  would  also  be  context-free.  But  L'  =  AnB"Cn,  which  is  not 
context-free,  so  neither  is  L. 
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13.5  Deterministic  Context-Free  Languages  • 

The  regular  languages  are  closed  under  complement,  intersection,  and  difference.  Why 
are  the  context-free  languages  different?  In  a  nutshell,  because  the  machines  that  ac¬ 
cept  them  may  necessarily  be  nondeterministic.  Recall  the  technique  that  we  used,  in 
the  proof  of  Theorem  K.4,  to  show  that  the  regular  languages  are  closed  under  comple¬ 
ment:  Given  a  (possibly  nondeterministic)  FSM  M\.  we  used  the  following  procedure 
to  construct  a  new  FSM  M2  such  that  L(M>)  =  -iL(Mj): 

1.  From  M\.  construct  an  equivalent  DFSM  M\  using  the  algorithm  ndfsmtodfsm, 
presented  in  the  proof  of  Theorem  5.3.  (If  Aft  is  already  deterministic.  M'  =  Af ,.) 

2.  M'  must  be  stated  completely,  so  if  it  is  described  with  an  implied  dead  state,  add 
the  dead  state  and  all  required  transitions  to  it. 

3.  Begin  building  M2  by  setting  it  equal  to  M'.  Then  swap  the  accepting  and  the 
nonaccepting  stales.  So  M2  =  (A'M-,  SM-,  ssr,  KM<  -  AM •). 

We  have  no  PDA  equivalent  of  ndfsmtodfsm .  so  we  cannot  simply  adapt  this  con¬ 
struction  for  PDAs.  Our  proofs  that  the  regular  languages  are  closed  under  intersec¬ 
tion  and  difference  relied  on  the  fact  that  they  were  closed  under  complement,  so  we 
cannot  adapt  those  proofs  here  either. 

We  have  no  PDA  equivalent  of  ndfsmtodfsm  because  there  provably  isn’t  one.  as  we 
will  show  shortly.  Recall  that,  in  Section  12.2.  we  defined  a  PDA  M  to  be  deterministic  iff: 

•  A m  contains  no  pairs  of  transitions  that  compete  with  each  other,  and 

•  iff/  isun  accepting  state  of  /W.then  there  is  no  transition  (( q ,  e,  e),  (p,  o))  for  any  p  ora. 

In  other  words.  M  never  has  a  choice  between  two  or  more  moves,  nor  does  it  have 
a  choice  between  moving  and  accepting. There  exist  context-free  languages  that  cannot 
be  accepted  by  any  deterministic  PDA.  But  suppose  that  we  restrict  our  attention  to 
the  ones  that  can. 

What  is  a  Deterministic  Context-Free  Language? 

We  are  about  to  define  the  class  of  deterministic  context-free  languages.  Because  this 
class  is  useful,  we  would  like  it  to  be  as  large  as  possible.  So  let  $  be  an  end-of-string 
marker.  We  could  use  any  symbol  that  is  not  in  1L  (for  example  <line  feed>  or  <cr>), 
but  $  is  easier  to  reud.  A  language  L  is  deterministic  context-free  iff  L%  can  be  accepted 
by  some  deterministic  PDA. 

To  see  why  we  have  defined  the  deterministic  context-free  languages  to  exploit  an 
end-of-string  marker,  consider  the  following  example  of  a  straightforward  language  for 
which  no  deterministic  PDA  exists  unless  an  end-of-string  marker  is  used. 


EXAMPLE  13.9  Why  an  End-of-String  Marker  is  Useful 

Let  L  =  a*U  (a"b'' :  n  >  0}.  Consider  any  PDA  M  that  accepts  L.When  it  be¬ 
gins  reading  a  s,  M  must  push  them  onto  the  slack  in  case  there  are  going  to  be  b’s. 
But.  il  it  runs  out  of  input  without  seeing  b’s,  it  needs  a  way  to  pop  those  a’s  from 
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the  slack  before  it  can  accept.  Without  an  end-of-string  marker. there  is  no  way  to 
allow  that  popping  to  happen  only  when  all  the  input  has  been  read.  So.  for  exam¬ 
ple.  the  following  PDA  accepts  L.  but  it  is  nondctcrminislic  because  the  transition 
to  state  3  (where  the  a  s  will  be  popped)  can  compete  with  both  of  the  other  tran¬ 
sitions  from  state  1. 


With  an  end-of-string  marker,  w-e  can  build  the  following  deterministic  PDA, 
which  can  only  lake  the  transition  to  state  3.  the  a-popping  state,  when  it  secs  the  $: 


Before  we  go  any  farther,  wc  have  to  be  sure  of  one  thing.  We  introduced  the  end- 
of-string  marker  to  make  it  easier  to  build  PDAs  that  are  deterministic.  We  need  to 
make  sure  that  it  doesn't  make  it  possible  to  build  a  PDA  for  a  language  /.  that  was  not 
already  context-free.  In  other  words,  adding  the  end-ol-siring  marker  cannot  convert  a 
language  that  was  not  context-free  into  one  that  is.  We  do  that  next. 

THEOREM  13.9  CFLs  and  Deterministic  CFLs 

Theorem:  Every  deterministic  context-free  language  (as  just  defined)  is  context-free. 

Proof:  If  L  is  deterministic  context-free,  then  /.$  is  accepted  by  some  deterministic 
PDA  M  =  (K,  r.  A..v.  A).  From  M.  we  construct  Af  such  that  L  (Af )  =  L. 
The  idea  is  that,  whatever  M  can  do  on  reading  $.  A/'  can  do  on  reading  e  (i.e.,by 
simply  guessing  that  it  is  at  the  end  of  the  input).  But.  as  soon  as  Af  makes  that 
guess,  it  cannot  read  any  more  input.  It  may  perform  the  rest  of  its  computation 
(such  as  popping  its  stack),  but  any  path  that  pretends  it  has  seen  the  $  before  it 
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has  read  all  of  its  input  will  fail  to  accept.To  enable  M ‘  to  perform  whatever  stack 
operations  M  could  have  performed,  but  not  to  read  any  input.  M '  will  be  com¬ 
posed  of  two  copies  of  M\  The  first  copy  will  be  identical  to  M,  and  M'  will  oper¬ 
ate  in  that  part  of  itself  until  it  guesses  that  it  is  at  the  end  of  the  input:  the  second 
copy  will  be  identical  to  M  except  that  it  contains  only  the  transitions  that  do  not 
consume  any  input.  The  states  in  the  first  copy  will  be  labeled  as  in  M.  Those  in 
the  second  copy  will  have  the  prime  symbol  appended  to  their  names.  So,  if  M 
contains  the  transition  ((ry,  e,yj),  (p,  y2)),  M'  will  contain  the  transition 
((</'.  e.  y  | ).  (/>',  y:)).  The  two  copies  will  be  connected  by  finding,  in  the  first  copy 
of  M.  every  S-transition  from  some  stale  q  to  some  state  p.  We  replace  each  such 
transition  with  an  e-transition  into  the  second  copy.  So  the  new  transition  goes 
from  r/  to  />'. 

We  can  define  the  following  procedure  to  construct  M'\ 

without$( M:  PDA)  = 

1.  Initially,  set  M'  to  M. 

I*  Make  the  copy  that  does  not  read  any  input. 

2.  For  every  state  q  in  M .  add  to  M‘  a  new  state  q'. 

3.  For  every  transition  ((ry,  e,  y,),  (y>,  y2))  in  Aa/  do: 

3.1.  Add  to  Aw-  the  transition  ((«/'.  e.  y,).  (p\  y2)). 

I*  Link  up  the  two  copies. 

4.  For  every  transition  ((ry,  $.  y,).  (p.  y2))  in  Aa/  do: 

4.1.  Add  to  AA/-  the  transition  ((q,e,  yi),  (p',  y2)). 

4.2.  Remove  ((<y,  $,  y,),  (/>,  y2))  from  AA/-. 

I*  Set  the  accepting  states  of  M' . 

5.  Am-  =  {q'.qeA}. 

Closure  Properties  of  the  Deterministic  Context-Free  Languages 

The  deterministic  context-free  languages  arc  practically  very  significant  because  it  is 
possible  to  build  deterministic,  linear  time  parsers  for  them.  They  also  possess  addi¬ 
tional  formal  properties  that  are  important,  among  other  reasons,  because  they  enable 
us  to  prove  that  not  all  context-free  languages  are  deterministic  context-free. Tlie  most 
important  of  these  is  that  the  deterministic  context-free  languages,  unlike  the  larger 
class  of  context-free  languages,  are  closed  under  complement. 


THEOREM  13.10  Closure  Under  Complement 

!  - - 

Theorem.  The  deterministic  context-free  languages  are  closed  under  complement. 

Proof:  The  proof  is  by  construction.  If  L  is  a  deterministic  context-free  language 
over  the  alphabet  i.,  then  L%  is  accepted  by  some  deterministic  PDA 
M  ~  (K,  -  U  {$ },  I ,  A..r,  A).  We  need  to  describe  an  algorithm  that  constructs 
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a  new  deterministic  PDA  that  accepts  (-./.)$.  To  prove  Theorem  8.4  (that  the  reg¬ 
ular  languages  are  closed  under  complement),  we  defined  a  construction  that 
proceeded  in  two  steps:  Given  an  arbitrary  FSM.  convert  it  to  an  equivalent 
DFSM.  and  then  swap  accepting  and  nonaccepting  stales.  We  can  skip  the  first 
step  here,  but  we  must  solve  a  new  problem.  A  deterministic  PDA  may  fail  to  ac¬ 
cept  an  input  string  iv  for  any  one  of  several  reasons: 

1.  Its  compulation  ends  before  it  finishes  reading  n\ 

2.  Its  computation  ends  in  an  accepting  state  but  the  slack  is  not  empty. 

3.  Its  compulation  loops  forever,  following  e-transitions,  without  ever  halting 
in  either  an  accepting  or  a  nonaccepting  state. 

4.  Its  compulation  ends  in  a  nonaccepting  state. 

If  we  simply  swap  accepting  and  nonaccepting  states  we  will  correctly  fail  to 
accept  every  string  that  M  would  have  accepted  (i.e.,  every  siring  in  L%).  But  we 
will  not  necessarily  accept  every  siring  in  (->L)$.  To  do  that,  we  must  also  address 
issues  1  through  3  above. 

An  additional  problem  is  that  we  don't  want  to  accept  ->/.  ( A/ ).  That  includes  strings 
that  do  not  end  in  $.  We  must  accept  only  strings  that  do  end  in  $  and  that  are  in 

A  construction  that  solves  these  problems  is  given  in  D.2. 

What  else  can  we  say  about  the  deterministic  context-free  languages?  We  know  that 
they  are  closed  under  complement.  What  about  union  and  intersection?  We  observe 
that  L|  n  L2  —  U  -‘Lz).  So,  if  the  deterministic  context-free  languages  were 

closed  under  union,  they  would  necessarily  be  closed  under  intersection  also.  But  they 
are  not  closed  under  union. The  context-free  languages  are  closed  under  union,  so  the 
union  of  two  deterministic  context-free  languages  must  be  context-free.  It  may,  however 
not  be  deterministic.  The  deterministic  context-free  languages  are  also  not  closed  under 
intersection.  In  fact,  when  two  deterministic  context-free  languages  are  intersected,  the 
result  may  not  even  be  context-free. 

THEOREM  13.11  Nonclosure  Under  Union 

Theorem:  The  deterministic  context-free  languages  are  not  closed  under  union. 

Proof:  We  show  a  counterexample: 

Let  L\  =  |  aVc*  :  Hall  and  /  *  j }. 

Let  L}  =  {a'b'c* :  i,j.  k  2  0  and  j  *  A  }. 

LetL'  =  L,UL2. 

=  {a'b'c*  :  /./,  k  2  0  and  ((/  *  j)or(j  *  k))). 

Let  L"  —  ->L'. 

=  { a'b'c*  :  k  2  0  and  /  =  j  -  k }  U 

^  iv  e  { a.  b.  c}*  :  the  letters  are  out  of  order}. 
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Let  Lm  =  L”  n  a*b*c*. 

=  {a"b"c"  :n  ^  0}. 

L\  and  L2  are  deterministic  context-free.  Deterministic  PDAs  that  accept  L  |$ 
and  L2$  can  be  constructed  using  the  same  approach  we  used  to  build  a  deter¬ 
ministic  PDA  for  L  =  {a'"b" :  m  #  n;  m,  n  >  0}  in  Example  12.7.  Their  union 
L'  is  context-free  but  it  cannot  be  deterministic  context-free.  If  it  were,  then  its 
complement  L "  would  also  be  deterministic  context-free  and  thus  context-free. 
But  it  isn’t.  If  it  were  context-free,  then  Lm ,  the  intersection  of  L”  with  a*b*c*, 
would  also  be  context-free  since  the  context-free  languages  are  closed  under  in¬ 
tersection  with  the  regular  languages.  But  L"'  is  ArBnCn  =  {a,,b''c'1 :  n  ^  0}, 
which  we  have  shown  is  not  context-free. 

THEOREM  13.12  Nonclosure  Under  Intersection 

I  ■  - 

Theorem:  The  deterministic  context-free  languages  are  not  closed  under  intersection. 

Proof:  We  show  a  counterexample: 

Let  L\  =  {aWc*  :  k  ^  0  and  i  =  j}. 

Let  L2  =  { a'^c*  :  k  ^  0  and  j  —  k}. 

Let  L'  —  Lj  H  Lj. 

-  {a"b"c"  :  n  2:  0}. 

L\  and  L2  are  deterministic  context-free.  The  deterministic  PDA  shown  in 
Figure  13.4  accepts  L\%.  A  similar  one  accepts  L2.  But  we  have  shown  that  their 
intersection  L'  is  not  context-free,  much  less  deterministic  context-free. 


A  Hierarchy  within  the  Class  of  Context-Free  Languages 

The  most  important  result  of  this  section  is  the  following  theorem:  There  are  context-free 
languages  that  aTe  not  deterministic  context-free.  Since  there  are  context-free  languages 
for  which  no  deterministic  PDA  exists,  there  can  exist  no  equivalent  of  ndfsmtodfsm  for 
PDAs.  Nondeterminism  is  a  fact  of  life  when  working  with  PDAs  unless  we  are  willing  to 
work  only  with  languages  that  have  been  designed  to  be  deterministic. 


a/e/ a 


s/e/e 


S/e/e 


FIGURE  134  A  deterministic  PDA 
that  accepts  { a'b'C*  :  i,  /,  k  s  0 
and  i  =  j. 
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The  fact  that  there  are  context-free  languages  that  are  not  deterministic  poses  a 
problem  for  the  design  of  efficient  parsing  algorithms.  The  best  parsing  algo¬ 
rithms  we  have  sacrifice  either  generality  (i.e.,  they  cannot  correctly  parse  all 
context-free  languages)  or  efficiency  (i.e.,  they  do  not  run  in  lime  that  is  linear  in 
the  length  of  the  input).  In  Chapter  15.  we  will  describe  some  of  these  algorithms. 


THEOREM  13.13  Some  CFLs  are  not  Deterministic  _ 

Theorem:  The  class  of  deterministic  context-free  languages  is  a  proper  subset  of  the 
class  of  context-free  languages.  Thus  there  exist  nondeterministic  PDAs  for 
which  no  equivalent  deterministic  PDA  exists. 

Proof:  By  Theorem  13.9,  every  deterministic  context-free  language  is  context-free. 
So  all  that  remains  is  to  show  that  there  exists  at  least  one  context-free  language 
that  is  not  deterministic  context-free. 

Consider  L  =  jaWc*  :  i,j,  k  >  0  and  ((/  *  j)  or  (j  *  &))}.  L  is  context-free. 
The  construction  of  a  grammar  for  it  was  an  exercise  in  Chapter  11.  But  we  can 
show  that  L  is  not  deterministic  context-free  by  the  same  argument  that  we  used 
in  the  proof  of  Theorem  13.1 1.  If  L  were  deterministic  context-free,  then,  by  TTteo* 
rem  13.10,  its  complement  L’  =  {a'b'cA :/.).  A:  3:  0  and  i  =  j  =  k]  U  {we  {a,b, 
c}* :  the  letters  are  out  of  order}  would  also  he  deterministic  context-free  and 
thus  context-free.  If  L'  were  context-free,  then  LM  =  L'  H  a*b*c*  would  also  be 
context-free  (since  the  context-free  languages  are  closed  under  intersection  with 
the  regular  languages).  But  L"  =  AnBnCn  =  { a"b"c"  :  n  >  ()}.  which  is  not  context- 
free.  So  L  is  context-free  but  not  deterministic  context-free. 

Since  L  is  context-free,  it  is  accepted  by  some  (nondeterministic)  PDA  M.  M  is 
an  example  of  a  nondeterministic  PDA  for  which  no  equivalent  deterministic  PDA 
exists.  If  such  a  deterministic  PDA  did  exist  and  accept  L,  it  could  be  converted  into 
a  deterministic  PDA  that  accepted  L%.  But.  if  that  machine  existed,  L  would  be 
deterministic  context-free  and  we  just  showed  that  it  is  not. 

We  get  the  class  of  deterministic  context-free  languages  when  we  think  about  the 
context-free  languages  from  the  perspective  of  PDAs  that  accept  them.  Recall  from 
Section  11.7.3  that,  when  we  think  about  the  contexiTree  languages  from  the  perspec¬ 
tive  of  the  grammars  that  generate  them,  we  also  get  a  subclass  of  languages  that  are,  in 
some  sense,  “easier”  than  others:  There  are  context-free  languages  for  which  unam¬ 
biguous  grammars  exist  and  there  arc  others  that  are  inherently  ambiguous,  by  which 
we  mean  that  every  corresponding  grammar  is  ambiguous. 

EXAMPLE  13.10  Inherent  Ambiguity  versus  Nondeterminism 

Recall  the  language  L\  =  {a'b'c* :  ij,  k  s  ()  and  ((/  =  j)  or  (j  =  k))},  which  can 
also  be  described  as  {a"b"c"' :  n.  m  >  0}  U  {a',b",c"1 :  /»,  m  s  0}.  L\  is  inherently 
ambiguous  because  every  string  that  is  also  in  AnBnCn  =  {a''b"c"  ;  n  >  ()}  is  an  ele¬ 
ment  of  both  sublanguages  and  so  has  at  least  two  derivations  in  any  grammar  for  Lt> 
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Now  consider  the  slightly  different  language  =  {a'lb',cmd  :  n,  m  ^  0}  U 
{a"b"'cme  :  n,  m  s  0}.  L2  is  not  inherently  ambiguous.  It  is  straightforward  to 
write  an  unambiguous  grammar  for  each  of  the  two  sublanguages  and  any 
string  in  L2  is  an  element  of  only  one  of  them  (since  each  such  string  must  end 
in  d  or  e  but  not  both).  L2  is  not,  however,  deterministic.  There  exists  no  PDA 
that  can  decide  which  of  the  two  sublanguages  a  particular  string  is  in  until  it 
has  consumed  the  entire  string. 


What  is  the  relationship  between  the  deterministic  context-free  languages  and  the 
languages  that  are  not  inherently  ambiguous?  The  answer  is  shown  in  Figure  13.5. 

The  subset  relations  shown  in  the  figure  are  proper: 

•  There  exist  deterministic  context-free  languages  that  are  not  regular.  These  lan¬ 
guages  are  in  the  innermost  donut  in  the  figure.  One  example  is  AnBn  =  {a"^  : 
n  s  0}. 

•  There  exist  languages  that  are  not  in  the  inner  donut  (i.e.,  they  are  not  determin¬ 
istic).  But  they  are  context-free  and  not  inherently  ambiguous.  T\vo  examples  of 
languages  in  this  second  donut  are: 

•  PalEven  =  {wwK  :  we.  {a, b}*}. The  grammar  we  showed  for  it  in  Example 
11.3  is  unambiguous. 

•  {aHb”cff,d:n,m  >  0}  U  {a^c'"®:  nt  m  >  0}. 

•  There  exist  languages  that  are  in  the  outer  donut  because  they  are  inherently 
ambiguous.  Two  examples  are: 

•  {a'b'c*  :  I,;,  k  ^  0  and  ((/  =  j)  or  (j  =  k))}. 

•  {aVc* :  i,;,  laO  and  ((i  #  7)  or  (7  #  k))}. 


FIGURE  13.5  A  hierarchy 
within  the  class  of  context-free 
languages. 
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To  prove  that  the  figure  is  properly  drawn  requires  two  additional  results: 

THEOREM  13.14  Every  Regular  Language  is  Deterministic  Context-Free 

Theorem:  Every  regular  language  is  deterministic  context-free. 

Proof:  The  proof  is  by  construction.  { $ }  is  regular.  So.  if  L  is  regular,  then  so  is  L% 
(since  the  regular  languages  are  closed  under  concatenation).  So  there  is  a  DFSM 
M  that  accepts  it.  Using  the  construction  that  we  used  in  the  proof  of  Theorem 
13.1  to  show  that  every  regular  language  is  context-free,  construct,  from  M  a  PDA 
P  that  accepts  L%.  P  will  be  deterministic. 

THEOREM  13.15  Every  Deterministic  CFL  has  an  Unambiguous  Grammar 

Theorem:  For  every  deterministic  context-free  language  there  exists  an  unambigu¬ 
ous  grammar. 

Proof:  If  a  language  L  is  deterministic  context-free,  then  there  exists  a  deterministic 
PDA  M  that  accepts  L%.  We  prove  the  theorem  by  construction  of  an  unambiguous 
grammar  G  such  that  L(M)  =  L(G).  We  construct  G  using  approximately  the  same 
technique  that  we  used  to  build  a  grammar  from  a  PDA  in  the  proof  ofTheorem  122. 
The  algorithm  PDAtoCFG  that  we  presented  there  proceeded  in  two  steps: 

1.  Invoke  convertPDAtorestricted(M)  to  build  M',  an  equivalent  PDA  in  restrict¬ 
ed  normal  form. 

2.  Invoke  buildgrammar  (M'),  to  build  an  equivalent  grammar  G. 

It  is  straightforward  to  show  that,  if  M '  is  deterministic,  then  the  grammar  G  that 
buildgrammar  constructs  will  be  unambiguous:  G  produces  derivations  that  mimic 
the  operation  of  M'.  Since  M '  is  deterministic,  on  any  input  w  it  can  follow  only  one 
path.  So  G  will  be  able  to  produce  only  one  leftmost  derivation  for  w  .Thus  w  has 
only  one  parse  tree.  If  every  string  in  L(G)  has  a  single  parse  tree,  then  G  is  unam¬ 
biguous.  Since  M'  accepts  L%,  G  will  generate  L%.  But  we  can  build,  from  G,  a  gram¬ 
mar  G'  that  generates  L  by  substituting  e  for  $  in  each  rule  in  which  $  occurs. 

So  it  remains  to  show  that,  from  any  deterministic  PDA  M ,  it  is  possible  to 
build  an  equivalent  PDA  M '  that  is  in  restricted  normal  form  and  is  still  deter¬ 
ministic.  This  can  be  done  using  the  algorithm  convert PDAtodetnormalform, 
which  is  described  in  the  proof,  presented  in  D.2.  ofTheorem  13.10  (that  the  de¬ 
terministic  context-free  languages  are  closed  under  complement).  If  M  is  deter¬ 
ministic,  then  the  PDA  that  is  returned  by  convert  PDAtodetnormalform(M)  will 
be  both  deterministic  and  in  restricted  normal  form. 

So  the  construction  that  proves  the  theorem  is: 

buildunambiggrammar(M‘.  deterministic  PDA)  = 

1.  Let  G  =  buildgrammar  (convert PDAiudctnomwl form  (M)). 

2.  Let  G'  be  the  result  of  substituting  e  for  $  in  each  rule  in  which  $  occurs. 

3.  Return  G'. 
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13.6  Ogden's  Lemma  • 

The  context-free  Pumping  Theorem  is  a  useful  tool  for  showing  that  a  language  is  not 
context-free.  However,  there  are  many  languages  that  are  not  context-free  but  that 
cannot  be  proven  so  just  with  the  Pumping  Theorem.  In  this  section  we  consider  a 
more  powerful  technique  that  may  be  useful  in  those  cases. 

Recall  that  the  Pumping  Theorem  for  regular  languages  imposed  the  constraint 
that  the  pumpable  region  y  had  to  fall  within  the  first  k  characters  of  any  “long”  string 
w.  We  exploited  that  fact  in  many  of  our  proofs.  But  notice  that  the  Pumping  Theorem 
for  context-free  languages  imposes  no  similar  constraint. The  two  pumpable  regions,  v 
and  y  must  be  reasonably  close  together,  but,  as  a  group,  they  can  fall  anywhere  in  w. 
Sometimes  there  is  a  region  that  is  pumpable,  even  though  other  regions  aren’t,  and 
this  can  happen  even  in  the  case  of  long  strings  drawn  from  languages  that  are  not 
context-free. 


EXAMPLE  13.11  Sometimes  Pumping  Isn't  Strong  Enough 

Let  L  =  {a'Vc; :  i,j  ^  0,  i  #  /}.  We  could  attempt  to  use  the  context-free  Pump¬ 
ing  Theorem  to  show  that  L  is  not  context-free.  Let  w  =  a*b*ck+kl.  (The  reason 
for  this  choice  will  be  clear  soon.)  Divide  w  into  three  regions,  the  a’s,  the  b’s,  and 
the  c’s,  which  we’ll  call  regions  1, 2,  and  3,  respectively.  If  either  v  or  y  contains 
two  or  more  distinct  symbols,  then  set  q  to  2.  The  resulting  string  will  have  letters 
out  of  order  and  thus  not  be  in  L.  We  consider  the  remaining  possibilities: 

•  (1, 1),  (2, 2),  (1,3),  (2, 3):  Set  q  to  2.  The  number  of  a’s  will  no  longer  equal  the 
number  of  b’s,  so  the  resulting  string  is  not  in  L. 

•  (1,2):  If  |v|  *  |y|  then  set  q  to  2. The  number  of  a’s  will  no  longer  equal  the 
number  of  b’s,  so  the  resulting  string  is  not  in  L.  If  |v|  =  |y|  then  set  q  to 
(fc!/|v|)  +  1.  Note  that  (fc!/|v|)  must  be  an  integer  since  |u|  ^  k.  The  string 
that  results  from  pumping  is  aAb*c*+k!,  where  X  =  k  +  (q  -  1)  •  \v\ 
=  k  +  (fc!/|v|)  •  |r|  =  k  +  k\.  So  the  number  of  a’s  and  of  b’s  equals  the 
number  of  c’s.  This  string  is  not  in  L.  So  far,  the  proof  is  going  well.  But  now 
we  must  consider: 

•  (3, 3):  Pumping  in  will  result  in  even  more  c’s  than  a’s  and  b’s,  so  it  will  pro¬ 
duce  a  string  that  is  still  in  L.  And,  while  pumping  out  can  reduce  the  number 
of  c’s,  it  can’t  reduce  it  all  the  way  down  to  k  because  |vxy|  <  k.  So  the  maxi¬ 
mum  number  of  c’s  that  can  be  pumped  out  is  k,  which  would  result  in  a  string 
with  k\  cs.  But,  as  long  as  k  s  3,  k\  >  k.  So  the  resulting  string  is  in  L  and  we 
have  failed  to  show  that  L  is  not  context-free. 

What  we  need  is  a  way  to  prevent  v  and  y  from  falling  in  the  c  region  of  w  . 


Ogden’s  Lemma  is  a  generalization  of  the  Pumping  Theorem.  It  lets  us  mark 
some  number  of  symbols  in  our  chosen  string  w  as  distinguished.  Then  at  least  one 
of  v  and  y  must  contain  at  least  one  distinguished  symbol.  So,  for  example,  we  could 
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complete  the  proof  that  we  started  in  Example  13.1 1  if  we  could  force  at  least  one  of 
v  or  v  to  contain  at  least  one  a. 

THEOREM  13.16  Ogden's  Lemma _ 

Theorem:  If  L  is  a  context-free  language,  then: 

3k  ^  1  (V  strings  w  e  L.  where  |w?|  2  k.  if  we  mark  at  least  k  symbols  of  xc  as 
distinguished  then: 

(3n,t\  x.  y,  z  (  w  =  uvxyz, 

vy  contains  at  least  one  distinguished  symbol, 
vxy  contains  at  most  k  distinguished  symbols,  and 
Vr/  s  0  (uvqxy'lz  is  in  /.))). 

Proof:  The  proof  is  analogous  to  the  one  we  did  for  the  context-free  Pumping  Theo¬ 
rem  except  that  we  consider  only  paths  that  generate  the  distinguished  symbols. 
If  L  is  context-free,  then  it  is  generated  by  some  context-free  grammar 
G  =  (V'.  2,  R.  5)  with  n  nonterminal  symbols  and  branching  factor  b.  Let  k  be 
b"+l.  Let  w  be  any  string  in  L(G)  such  that  \  w\  &  k.  A  parse  tree  T  for  w  might 
look  like  the  one  shown  in  Figure  13.6. 

Suppose  that  we  mark  at  least  b symbols  as  distinguished.  The  distinguished 
symbols  are  marked  with  a  ✓  (Ignore  the  fact  that  there  aren’t  enough  of  them  in  the 
picture.  Its  only  role  is  to  make  it  easier  to  visualize  the  process.)  Call  the  sequence  of 
distinguished  nodes  the  distinguished  subsequence  of  tc .  In  this  example,  that  is  bje. 
Note  that  the  distinguished  subsequence  is  not  necessarily  a  substring. The  characters 
in  it  need  not  be  contiguous.  The  length  of  the  distinguished  subsequence  is  at  least 
b"*1.  We  can  now  mark  the  nonleaf  nodes  that  branched  in  a  way  that  enabled  the 
distinguished  subsequence  to  grow  to  at  least  length  />"'*.  Mark  every  nonleaf  node 
that  has  at  least  two  daughters  that  contain  a  distinguished  leaf.  In  this  example,  we 
mark  Xi%  and  X\,  as  indicated  by  the  symbol  4.  It  is  straightforward  to  prove  by  in¬ 
duction  that  T  must  contain  at  least  one  path  that  contains  at  least  n  +  1  marked 


s 


FIGURE  13.6  A  parse  tree  with  some  symbols  marked  us  distinguished. 
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nonleaf  nodes  since  ils  yield  contains  ft'l+l  distinguished  symbols.  Choose  one  such 
path  such  that  there  is  no  longer  one.  That  path  must  contain  at  least  two  nodes  la¬ 
beled  with  the  same  nonterminal  symbol.  Choose  the  two  nodes  that  are  labeled  with 
the  bottom-most  pair  of  repeated  marked  nonterminals.  Call  the  lower  one  N  and  the 
higher  one  M.  In  the  example,  M  is  Xx  and  N  is  Xi.  As  shown  in  the  diagram,  divide  w 
into  uvxyz,  such  that  x  is  the  yield  of  N  and  vxy  is  the  yield  of  Af .  Now  observe  that: 

•  vy  contains  at  least  one  distinguished  symbol  because  the  root  of  the  subtree 
with  yield  vxy  has  at  least  two  daughters  that  contain  distinguished  symbols. 
One  of  them  may  be  in  the  subtree  whose  yield  is  jc,  but  that  leaves  at  least 
one  that  must  be  in  either  v  or  y.  There  may  be  distinguished  symbols  in  both, 
although,  as  in  our  example  T ,  that  is  not  necessary. 

•  vxy  contains  at  most  k  ( b”+] )  distinguished  symbols  because  there  are  at  most 
n  +  1  marked  internal  nodes  on  a  longest  path  in  the  subtree  that  dominates 
it.  Only  marked  internal  nodes  create  branches  that  lead  to  more  than  one 
distinguished  symbol,  and  no  internal  node  can  create  more  than  b  branches. 

•  v</  ^  0  ( uifxyqz  is  in  L),  by  the  same  argument  that  we  used  in  the  proof  of 
the  context-free  Pumping  Theorem. 

Notice  that  the  context-free  Pumping  Theorem  describes  the  special  case  in  which 
all  symbols  of  the  string  w  are  marked. 

Ogden’s  Lemma  is  the  tool  that  we  need  to  complete  the  proof  that  we  started  in 
Example  13.11. 


EXAMPLE  13.12  Ogden's  Lemma  May  Work  When  Pumping  Doesn't 

Now  we  can  use  Ogden  s  Lemma  to  complete  the  proof  that  L  =  {a'b'c1 : 
i,j  >  0,  /  *  /}  is  not  context-free.  Let  w  =  a*b*c*+*!.  Mark  all  the  a’s  in  w  as  dis¬ 
tinguished.  If  either  v  or  y  contains  two  or  more  distinct  symbols,  then  set  q  to  2. 
The  resulting  string  will  have  letters  out  of  order  and  thus  not  be  in  L.  We  consid¬ 
er  the  remaining  possibilities: 

•  (1,1).  (1. 3):  Set  q  to  2.  The  number  of  a’s  will  no  longer  equal  the  number  of 
b’s,  so  the  resulting  string  is  not  in  L. 

•  (1,2):  If  I  v|  *  |y|  then  set  <7  to  2.  The  number  of  a’s  will  no  longer  equal  the 
number  of  b’s,  so  the  resulting  string  is  not  in  L.  If  |w|  =  |y|  then  set  q  to 
(*!/|v|)  +  1.  Note  that  ( k'.l\v\ )  must  be  an  integer  since  \v\  s  k.  The  string 
that  results  from  pumping  is 

c*+k  =  ck+*’.  So  the  number  of  a’s  and  of  b’s  equals  the  number  of 

c’s.This  string  is  not  in  L. 

•  (2. 2),  (2, 3),  (3, 3)  fail  to  satisfy  the  requirement  that  at  least  one  symbol  in  vy 
be  marked  as  distinguished. 

There  is  no  way  to  divide  w  into  vxy  such  that  all  the  conditions  of  Ogden’s 
Lemma  are  met.  So  L  is  not  context-free. 
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13.7  Parikh's Theorem# 

Suppose  thal  we  consider  a  language  /.  not  from  the  point  of  view  of  the  exact 
strings  it  contains  but  instead  by  simply  counting,  lor  each  string  ir  in  L.  how  many 
instances  of  each  character  in  1  ir  contains.  So.  from  this  perspective,  the  strings 
aaabbba  and  abababa  are  the  same.  If  2:  is  {a.  b).  then  both  strings  can  be  de¬ 
scribed  with  the  pair  (4. 3)  since  they  contain  4  a's  and  3  b’s.  We  can  build  such  de¬ 
scriptions  by  defining  a  family  of  functions  </»\.  with  domain  21*  and  range 
where  k  =  |£|: 

0v(ir)  =  (/,./, _ /J  where,  for  all/./,  =  the  number  ol  occurrences  in 

ir  of  the  /"'  element  of 

So.  if  S  =  (a.b.  c.  d}.  then  i//i(aabbbbddd)  =  (2. 4. 0.3). 

Now  consider  some  language  /..  which  is  a  set  ol  strings  over  some  alphabet  2.  In¬ 
stead  of  considering  /.  as  a  set  of  strings,  we  can  consider  it  as  the  set  of  vectors  that  are 
produced  by  applying  i //v  to  the  strings  it  contains.  To  do  this,  we  define  another  family 
of  functions  with  domain  ./*  ( 2.*)  and  range  iz, . . .  ) }: 

'Pv(/-)  =  {(f|. r*2 _ 'a)  :  SwbL  («frv  (»<’)  - 

If  21  is  fixed,  then  there  is  a  single  function  i h  and  a  single  function  Mf.  In  that  case, 
we  will  omit  21  and  refer  to  the  functions  just  as  i h  and  'I'. 

We  will  sav  that  two  languages  L  \  and  /.2.ovcr  the  alphabet  21  \  are  leiier-equivaleni 
iff  'Px  (7-i)  =  'f'v  ( L-> ).  In  other  words.  /.(  and  /.2  contain  the  same  strings  if  we  disre¬ 
gard  The  order  in  which  the  symbols  occur  in  the  strings. 


EXAMPLE  13.13  Letter  Equivalence 

Let  21  *  {a,  b}.  Then,  for  example,  0(a)  =  (1, 0).  0(b)  =  (0.  l).i//(ab)  =  (1.1). 
0(aaabbbb)  =(3,4). 

Now  consider  'I': 


•  Let  Li  =  AnB"  =  { a"b" : n  2  <)}.  Tlien  M'(/-i)  =  { (i. i):lls  /}. 


•  Let  Lz  =  (ab)*. 

•  Let  Li  =  { a"bV :  n  2  0}. 

•  Let  L4  =  { a2,,b"  :  n  2  0). 

•  Let  7-5  =  (aba)*. 


Then  *P  (/.>)  =  {(/,/)  :0  s,  /}. 
Then  'P  (/.,)  =  |(2i.i):0s  /}. 
Then  'I'  (U)  =  {<2i,i):0  £  /}. 
Then^(/-S)  =  ((2i,i):0s  /}. 


L\  and  Lz  are  letter-equivalent.  So  are  /.4  and  /.«. 


Just  looking  at  the  five  languages  we  considered  in  Hxainplc  13.13.  we  can  observe 
that  it  is  possible  for  two  languages  with  different  formal  properties  (lor  example  a 
regular  language  and  a  context-free  but  not  regular  one)  to  be  letter  equivalent  to 
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each  other.  Ly  is  not  context-free.  LA  is  context-free  but  not  regular,  is  regular.  But 
the  three  of  them  are  letter  equivalent  to  each  other. 

Parikh’s  Theorem,  which  we  are  about  to  slate  formally  and  then  prove,  tells  us 
that  that  example  is  far  from  unique.  In  fact,  given  any  context-free  language  L. 
there  exists  some  regular  language  L'  such  that  L  and  V  are  letter-equivalent  to 
each  other.  So  A"Bn  is  letter  equivalent  to  (ab)*.  The  language  {a^'b" :  n  ^  0}  is 
letter  equivalent  to  (aba)*  and  to  (aab)*.  And  PalEven  =  {um>R  •  we  {a,  b}*}  is 
letter  equivalent  to  (aa  U  bb)*  since  'P(PalEven)  =  ^((aa  U  bb)*) 

=  { (2/,  2 j)  :0siAl)s  /}.  The  proof  of  Parikh’s  Theorem  is  similar  to  the  proofs 
we  have  already  given  for  the  Context-free  Pumping  Theorem  and  for  Ogden’s 
Lemma.  It  is  based  on  the  fact  that,  if  L  is  context-free,  then  all  the  strings  in  L  can 
be  formed  by  starting  with  one  of  a  finite  set  of  “short”  strings  in  L  and  then  pump¬ 
ing  in  some  finite  number  of  strings  (»\y  pairs),  all  of  which  are  chosen  from  a  finite 
library  of  possible  values  for  v  and  v. 

An  interesting  application  of  Parikh’s  theorem  is  in  the  proof  of  a  corollary  that  tells 
us  that  every  context-free  language  over  a  single  character  alphabet  must  also  be  regular. 
We  will  add  that  corollary  to  our  kit  of  tools  for  proving  that  a  language  is  not  context- 
free  (by  showing  that,  if  it  were,  then  it  would  also  be  regular  but  we  know  that  it  isn't). 

Notice,  by  the  way.  that  while  we  are  about  to  prove  that  if  L  is  context-free 
then  it  is  letter-equivalent  to  some  regular  language,  the  converse  of  that  claim  is 
false.  A  language  can  be  letter-equivalent  to  some  regular  language  and  not  be 
context-free.  We  prove  this  by  considering  two  of  the  languages  from  Example 
13.13:  Ly  =  (a"b"a":  n  s  0}  is  not  context-free,  but  it  is  letter-equivalent  to 
Ly  =  (aba)*,  which  is  regular. 

THEOREM  13.17  Parikh's  Theorem 

Theorem:  Every  context-free  language  is  letter-equivalent  to  some  regular  language. 

Proof:  The  proof  follows  an  argument  similar  to  the  one  we  used  to  prove  the 
context-free  Pumping  Theorem.  It  is  given  in  D.3. 

An  algebraic  approach  to  thinking  about  w-hat »//  and  are  doing  is  the  following; 
We  can  describe  the  standard  way  of  looking  at  strings  as  starting  with  a  set  S  of  prim¬ 
itive  strings  (e  and  all  the  one-character  strings  drawn  from  2)  and  the  single  opera¬ 
tion  of  concatenation,  which  is  associative  and  has  e  as  an  identity.  S*  is  then  the 
closure  of  A’  under  concatenation.  i//v  maps  elements  of  £*  to  elements  of 
{(/|,  h*."ik)}->  which  is  detined  the  operation  of  pair  wise  addition,  which  is  asso¬ 
ciative  and  has  ((),(),...())  as  an  identity.  But  addition  is  also  commutative,  while  con¬ 
catenation  is  not.  So.  while,  if  we  concatenate  strings,  it  matters  what  order  we  do  it  in, 
if  we  consider  the  images  of  strings  under  <//.  the  order  in  which  we  combine  them 
doesn’t  matter.  Parikh  s  theorem  can  be  described  as  a  special  case  of  more  general 
properties  of  commutative  systems. 

When  1  contains  just  a  single  character,  the  order  of  the  characters  in  a  string  is  ir¬ 
relevant.  So  we  have  the  following  result: 
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THEOREM  13.18  Every  CFL  Over  A  Single-Character  Alphabet  is  Regular 

Theorem:  Any  context-free  language  over  a  single-character  alphabet  is  regular. 

Proof:  By  Parikh's  Theorem,  if  L  is  context-free  then  /.  is  letter-equivalent  to  some 
regular  language  L'.  Since  the  order  of  characters  has  no  effect  on  strings  when 
all  characters  are  the  same.  L  =  L’.  Since  L'  is  regular,  so  is  L. 


EXAMPLE  13.14  AnAn  is  Regular 

Let  2  =  {a,  b}  and  consider  L  =  AnBn  =  {a"b" :  n  2  <>}.  AnBn  is  context-free 
but  not  regular. 

Now  let:  2  =  {a}  and  L'  =  {a"a".  n  &  ()}. 

=  { a'"  :  n  >  0}. 

=  {w.*e  {a}*  :  |u>|  is  even}.  L’  is  regular. 


EXAMPLE  13.15  PalEven  is  Regular  if  2  =  {a} 

Let  2  =  {a.  b)  and  consider  L  ~  PalEven  =  {ten?* :  tee  {a.  b}*}.  PalEven  is 
context-free  but  not  regular. 

Now  let:  2  =  {a}  and  L'  =  { wwK  :  we  {a}*} 

=  {we  {a}*  :  |ir|  is  even}.  /.*  is  regular. 


When  we  arc  considering  only  a  single  letter  alphabet,  we  can  use  Theorem  13.18  to 
show  that  a  language  that  we  already  know  not  to  be  regular  cannot  be  context-free  either. 


EXAMPLE  13.16  The  Prime  Number  of  a's  Language  is  Not  Context-Free 

Consider  again  Primea  =  {a"  :  n  is  prime}.  Prime,,  is  not  context-free.  If  it  were, 
then,  by  Theorem  13.18.  it  would  also  be  regular.  But  we  showed  in  Example  8.13 
that  it  is  not  regular.  So  it  is  not  context-free  either. 


13.8  Functions  on  Context-Free  Languages  • 

In  Section  13.4.  we  saw  that  the  context-free  languages  are  closed  under  some  impor¬ 
tant  functions,  including  concatenation,  union,  and  Kleene  star.  But  their  closure  prop¬ 
erties  are  substantially  w'eaker  than  are  the  closure  properties  of  the  regular  languages. 
In  this  section,  we  consider  some  other  functions  that  can  be  applied  to  languages  and 
we  ask  whether  the  context-free  languages  are  closed  under  them. The  proof  strategies 
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we  will  use  are  the  same  as  the  ones  we  used  for  the  regular  languages  and  for  the 
results  we  have  already  obtained  for  the  context-free  languages: 

•  To  show  that  the  context-free  languages  are  closed  under  some  function  /.  we  will 
show  an  algorithm  that  constructs,  given  any  context-free  language  L,  either  a 
grammar  or  a  PDA  that  describes  f(L). 

•  To  show  that  the  context-free  languages  are  not  closed  under  some  function /,  we  will 
exhibit  a  counterexample,  i.e.,  a  language  L  where  L  is  context-free  but  f(L )  is  not. 


EXAMPLE  13.17  Firstchars 

Consider  again  the  function  firstchars  ( L )  =  {tv  ’  3y  e  L  (y  =  cx  A  c  e  A  x  e 
'ZL*  A  wee*)}.  The  context-free  languages  are  closed  under  firstchars(L).  In 
fact,  if  L  is  context-free  then  firstchars(L)  is  regular.  We  know  that  this  must  be 
true  by  an  argument  similar  to  the  one  we  used  in  Example  8.20  to  show  that  the 
regular  languages  are  closed  under  firstchars.  There  must  be  some  finite  set  of 

characters  {cj,  c2 . c„}  that  can  begin  strings  in  L  (since  1L  is  finite).  So  there 

exists  some  regular  expression  of  the  following  form  that  describes  firstchars(L): 

c,*Uc2*  U  Uc„*. 

We  can  also  show  a  constructive  proof  that  firstchars{L)  is  context-free  if  L  is. 

If  L  is  a  context-free  language,  then  there  is  some  context-free  grammar 
G  =  (V\  2,  R,  S)  that  generates  it.  We  construct  a  context-free  grammar 
G'  =  (V\  S',/?',  S')  that  generates  ftrstchars{L): 

1.  Convert  G  to  Greibach  normal  form  using  the  procedure 
convcrttoGreibach,  defined  in  D.l. 

2.  Remove  from  G  all  unreachable  nonterminals  and  all  rules  that  mention 
them. 

3.  Remove  from  G  all  unproductive  nonterminals  and  all  rules  that  mention 
them. 

4.  Initialize  V'  to  {S'},  S'  to  {},  and  R'  to  {}. 

5.  For  each  remaining  rule  in  G  of  the  form  5  -*  c  y  do: 

5.1.  Add  to  R '  the  rules  S '  —  Cc,  Cc  ->  c  Cc  and  Cc  -*  e. 

5.2.  Add  to  S'  the  symbol  c. 

53.  Add  to  V'  the  symbol  Cc. 

6.  Return  G'. 

The  idea  behind  this  construction  is  that,  if  G  is  in  Greibach  normal  form,  then, 
each  time  a  rule  is  applied,  the  next  terminal  symbol  is  generated.  So,  if  we  look  at 
G  s  start  symbol  5  and  ask  what  terminals  any  of  its  rules  can  generate,  we’ll  know 
exactly  what  terminals  strings  in  L(G)  can  start  with. 
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EXAMPLE  13.18  Maxstring 

Consider  again  the  function  nwxsiring(L)  ~  { w  :  tv  e  L  and  Vz  e  *  e 
—* wz e  L)\. The  context-free  languages  are  not  closed  under  ntuxstring( L ) . The 
proof  is  by  counterexample.  Consider  the  language  L  =  { a'b'c* :  k  ^  i  or  k  ^  )}• 

L  is  context-free  but  maxstringi  L )  is  not.  We  leave  the  proof  of  this  as  an  exercise. 

Exercises 

1.  For  each  of  the  following  languages  L. state  whether  /-  is  regular,  context-free  but 
not  regular,  or  not  context-free  and  prove  your  answer. 

a.  {.yv :  x.  v  e  { a.  b } *  and  |.v|  =  |y I } . 

b.  {(ab)"a"b"  :  n  >  ()}. 

c.  {.v#y  :  x.  y  e  { 0. 1 } *  and  x  *  v } . 

d.  { a'b" :  /.  n  >  0  and  i  =  n  or  i  ~  In } . 

e.  {  ivx:  |w|  =  2*  |.v|  and  we  a  b  and  .re  a  b  ). 
r.  { a'V'c*  :  n. m.  k  s  0  and  m  ^  min  (/i.  A)}. 

g.  {.vy.rR  :  .v  e  { 0. 1}  *  and  ye  { 0. 1}* }. 

h.  {aw.vr  :  x ,  w  e  { a,  b  }*  and  |.v|  =  I  w| } . 

i.  {wwRw :  we  {a.b}*}. 

j.  {u>xu?:  |w|  =  2*|.r|  and  we{a,b}*and.ve  {c}*}- 

k.  { a':  /  ^  0 }  { b':  /  s  OH  a':*  -  °)* 

l.  {.ve  {a.  b}*  :  |.v|  is  even  and  the  first  half  of.r  has  one  more  a  than  does  the 
second  half}. 

m.  {we  {a.  b}* :  #a(M’)  =  and  w  does  not  contain  either  the  substring 

aaa  or  abab}. 

n.  { a"b2"cm  : «,  m  a  0}  fl  { a"bmcm  :  n,  in  >  «}. 

o.  {.r  c  y :  .v.  y  e  { 0. 1}*  and  v  is  a  prefix  of  v } . 

p.  {«’ :  w  =  <mR  or  w  —  ua"  :  n  =  |//|,/<e  {a.  b}*}. 

q.  L(G),  where  6’  =  S— ►  a.Sa 

S-*SS 

S->c 


r.  { w  e  (A-Z.  a-z, ..  blank)* :  there  exists  at  least  one  duplicated,  capitalized  word 
in  w).  For  example,  the  string. The  history  of  China  can  be  viewed  from 
the  perspective  of  an  outsider  or  of  someone  living  in  China.  eL 


s.  -.£<>,  where  L{i  =  { mw :  we  { a. b } * } . 

t.  L*.  where  L  =  {On'O’l'O* ;  /  a  0}. 


u.  -iAnBn. 


v.  { ba'b:  j  —  n2  for  some  n  s  U}.  For  example,  baaaab  e  L. 

w.  {we  {a.b.c.d}*:  #b(w)  s  #L(w)  ^  #,j(w)  - 
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2.  Let  L  =  {u*e  (a.  b}* :  the  first,  middle,  and  last  characters  of  w  are  identical}. 

a.  Show  a  context-free  grammar  for  L. 

b.  Show  a  natural  PDA  that  accepts  L. 

c.  Prove  that  /.  is  not  regular. 

3.  Let  L  =  { aMb"'c"d"'  :n,m  ^  1 } .  L  is  interesting  because  of  its  similarity  to  a  useful 
fragment  of  a  typical  programming  language  in  which  one  must  declare  procedures 
before  they  can  be  invoked.  The  procedure  declarations  include  a  list  of  the  formal 
parameters.  So  now  imagine  that  the  characters  in  a"  correspond  to  the  formal  pa¬ 
rameter  list  in  the  declaration  of  procedure  l.Thc  characters  in  b"'  correspond  to  the 
formal  parameter  list  in  the  declaration  of  procedure  2. Then  the  characters  in  c"  and 
d'"  correspond  to  the  parameter  lists  in  an  invocation  of  procedure  1  and  procedure  2 
respectively,  with  the  requirement  that  the  number  of  parameters  in  the  invocations 
match  the  number  of  parameters  in  the  declarations.  Show  that  L  is  not  context-free. 

4.  Without  using  the  Pumping  Theorem,  prove  that  L- {«>e{a.b,  c}*  :  #a(w)  = 
#b(w)  =  #c(h’)  and  #a(w)  >  50}  is  not  context-free. 

5.  Give  an  example  of  a  context-free  language  L  (*  2*)  that  contains  a  subset  L\ 
that  is  not  context-free.  Prove  that  L  is  context  free.  Describe  Lx  and  prove  that  it 
is  not  context-free. 

6.  Let  L\  —  Li  n  Ly 

a.  Show  values  for  L \ ,  Zo,  and  Ly  such  that  L  x  is  context-free  but  neither  L2  nor  JL3  is. 

b.  Show  values  for  Lu  Ly  and  Ly  such  that  Li  is  context-free  but  neither  L !  nor  L3  is. 

7.  Give  an  example  of  a  context-free  language  L,  other  than  one  of  the  ones  in  the 
book,  where  is  not  context-free. 

ft.  Theorem  13.7  tells  us  that  the  context-free  languages  are  closed  under  intersec¬ 
tion  with  the  regular  languages.  Prove  that  the  context-free  languages  are  also 
closed  under  union  with  the  regular  languages. 

9.  Complete  the  proof  that  the  context-free  languages  are  not  closed  under 
maxsirinK  by  showing  that  L  =  {aVc k:k  <  i  or  k  s  j)  is  context-free  but 
muxstrinfflL)  is  not  context-free. 

10.  Use  the  Pumping  Theorem  to  complete  the  proof,  started  in  L.3.3,that  English  is 
not  context-free  if  we  make  the  assumption  that  subjects  and  verbs  must  match  in 
a  "respectively"  construction. 

11.  In  N.1.2,  we  give  an  example  of  a  simple  musical  structure  that  cannot  be  de¬ 
scribed  with  a  context-free  grammar.  Describe  another  one,  based  on  some  musi¬ 
cal  genre  with  which  you  are  familiar.  Define  a  sublanguage  that  captures  exactly 
that  phenomenon.  In  other  words,  ignore  everything  else  about  the  music  you  are 
considering  and  describe  a  set  of  strings  that  meets  the  one  requirement  you  are 
studying.  Prove  that  your  language  is  not  context-free. 

12.  Define  the  leftmost  maximal  P  subsequence  m  of  a  string  w  as  follows: 

•  P  must  be  a  nonempty  set  of  characters. 

•  A  string  S  is  a  P  subsequence  of  w  iff  S  is  a  substring  of  w  and  S  is  composed 
entirely  of  characters  in  P.  Tor  example  1, 0, 10. 01, 11,  OIL  101, 11L 111L  and 
1011  are  {0. 1}  subsequences  of  2312101121111. 
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•  Let  5  be  the  set  of  all  P  subsequences  of  w  such  that,  for  each  element  t  of  S. 
there  is  no  P  subsequence  of  tv  longer  than  /.  In  the  example  above.  S  =  { HH 
1011} . 

•  Then  m  Is  the  leftmost  (within  w )  element  of  .S'.  In  the  example  above, m  ~  1011 

a.  Let  1  -  {  it’e  {0-9}*:  if  y  is  the  leftmost  maximal  (0. 1}  subsequence  of  tc 
then  |y|  is  even}.  Is  1  regular  (but  not  context  free). context  free  or  neither? 
Prove  your  answer. 

b.  Let  1  =  {ice  {a.  b.  c}* :  the  leftmost  maximal  {a.  b}  subsequence  of  «? 
starts  with  a}.  Is  1  regular  (but  not  context  free),  context  free  or  neither? 
Prove  your  answer. 

13.  Are  the  context-free  languages  closed  under  each  of  the  following  functions? 
Prove  your  answer. 

a.  chop(L)  =  { w  :3xe  L  ( x  =  X|ix2  A  Xj  e  2,*  A  x2e  *  A  c e  2*.  A 

Uil  =  |x2|  AM)  =  x,x2)} 

b.  mix(L)  =  {u>:  3a.  y,  z:  (xe  1,  x  =  yz ,  lyl  =  |c|.  w  =  ycR)} 

c.  pref[L)  =  {w?:  3xe  2*(«?x e  I)} 

d.  middle!  L)  =  {a:  3y,  z  e  2*(yxs  e  /.)} 

e.  Letter  substitution 

f.  shuffle(L)  =  {it? :  3 x  e  1  (w»  is  some  permutation  of  x) } 

g.  copy  reverse  (1)  =  { it? :  3x  e  1  (ir  =  x.vK) } 

14.  Let  ah  ( L )  =  {x:3y.n(yeL,  |y|  =  n,  n  >  0,  y  =  n,  •  •  •  </„.  Vi  ^  n  («f  e  2),  and 

x  =  where  k  =  (if  n  is  even  then  n  -  1  else  /i))}. 

a.  Consider!  =  a"b".  Clearly  describe  L\  =  ah(L). 

b.  Are  the  context  free  languages  closed  under  the  function  all'}  Prove  your  answer. 

15.  Let  L\  =  {a"b'"  :  n  a  m}.  Let  R\  =  {(a  U  b)*  :  there  is  an  odd  number  of  a*s 
and  an  even  number  of  b’s}.  Use  the  construction  that  is  described  in  the  proof 
of  Theorem  13.7  to  build  a  PDA  that  accepts  Lx  Cl  Rx. 

16.  Let  T  be  a  set  of  languages  defined  as  follows: 


T  =  {1:1  is  a  context-free  language  over  the  alphabet  { a.  b.  c  } 

and.  if  j cel,  then  Ul  *j  0}. 

Let  P  be  the  following  function  on  languages: 

P(L)  -  {«?  t  3xe  {a.b.  c}  and  3jeL  and  y  *  am?}. 

Is  the  set  T  closed  under  P ?  Prove  your  answer. 

17.  Show  that  the  following  languages  are  deterministic  context-free: 

a.  {w? :  m? e  {a,  b }*  and  each  prefix  of  it*  has  at  least  as  many  a's  as  b's} 

b.  ja"b":/i  s  0}  U  {aV: /i  s?  ()} 

18.  Show  that  1  =  { a"b" :  n  >  0}  U  { a'^2" :  n  2  0}  is  not  deterministic  context-free. 

19.  Are  the  deterministic  context-free  languages  closed  under  reverse?  Prove  your 
answer. 

20.  Prove  that  each  of  the  following  languages  is  not  context-free.  (Hint:  Use 
Ogden’s  Lemma.) 
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a.  {a'b^c* :  i  s  0,;  s  0,  fe  2:  0,  and  i  *  j  *  Ac} 

b.  {aVc*dn  :i20,/a0,i2:0,n  20,  and  (i  =  Oor/  =  k  =  n)} 

21.  Let  ¥(L)  be  as  defined  in  Section  13.7,  in  our  discussion  of  Parikh’s  Theorem. 
For  each  of  the  following  languages  L,  first  state  what  ^(L)  is.  Then  give  a  regu¬ 
lar  language  that  is  letter-equivalent  to  L. 

a.  Bal  =  {me  {),  (}*  :  the  parentheses  are  balanced} 

b.  Pal  -  {wb  {a,  b}*  :  w  is  a  palindrome} 

c.  {xR#y :  x,  y  e  {0,1}*  and  x  is  a  substring  of  y) 

22.  For  each  of  the  following  claims,  state  whether  it  is  True  or  False.  Prove  your  answer. 

a.  If  L i  and  L2  are  two  context-free  languages,  Lx  —  L2  must  also  be  context-free. 

b.  If  Lj  and  L2  are  two  context-free  languages  and  L\  =  L2L3,  then  L3  must 
also  be  context-free. 

c.  If  L  is  context  free  and  R  is  regular,  R  -  L  must  be  context-free. 

d.  If  L,  and  L2  are  context-free  languages  and  LxQLQ  L2,  then  L  must  be 
context-free. 

e.  If  L\  is  a  context-free  language  and  L2  Q  Lu  then  L2  must  be  context-free. 

f.  If  Lx  is  a  context-free  language  and  L2  C  Lu  it  is  possible  that  L2  is  regular. 

g.  A  context-free  grammar  in  Chomsky  normal  form  is  always  unambiguous. 


CHAPTER  14 


Algorithms  and  Decision  Procedures 
for  Context-Free  Languages 

Many  questions  that  we  could  answer  when  asked  about  regular  languages 
are  unanswerable  for  context-free  ones.  But  a  few  important  questions  can 
be  answered  and  we  have  already  presented  a  useful  collection  of  algo¬ 
rithms  that  can  operate  on  context-free  grammars  and  PDAs.  We  ll  present  a  few 
more  here. 

14.1  The  Decidable  Questions 

Fortunately,  the  most  important  questions  (i.e.,  the  ones  that  must  be  answerable  if 
context-free  grammars  are  to  be  of  any  practical  use)  are  decidable. 


14.1.1  Membership 

We  begin  with  the  most  fundamental  question.  "Given  a  language  /.  and  a  string  w .  is  w 
in  L?"  Fortunately  this  question  can  be  answered  for  every  context-free  language.  By 
Theorem  12.1.  for  every  context-free  language  L.  there  exists  a  PDA  M  such  that  M  ac¬ 
cepts  L.  But  we  must  be  careful.  As  we  showed  in  Section  1 2.4.  PDAs  are  not  guaranteed 
to  halt.  So  the  mere  existence  of  a  PDA  that  accepts  l.  does  not  guarantee  the  existence 
of  a  procedure  that  decides  it  (i.e.,  always  halts  and  says  yes  or  no  appropriately). 

It  turns  out  that  there  are  two  alternative  approaches  to  solving  this  problem,  both 
of  which  work: 

•  Use  a  grammar:  Using  facts  about  every  derivation  that  is  produced  by  a  grammar 
in  Chomsky  normal  form,  we  can  construct  an  algorithm  that  explores  a  finite  num¬ 
ber  of  derivation  paths  and  finds  one  that  derives  a  particular  siring  «'  iff  such  a 
path  exists. 
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•  Use  a  PDA:  While  not  all  PDAs  hall,  it  is  possible,  for  any  context-free  language  L, 
to  craft  a  PDA  M  that  is  guaranteed  to  halt  on  all  inputs  and  that  accepts  all  strings 
in  L  and  rejects  all  strings  that  are  not  in  L. 

Using  a  Grammar  to  Decide 

We  begin  by  considering  the  first  alternative.  We  show  a  straightforward  algorithm  for 
deciding  whether  a  string  w  is  in  a  language  L : 

i lecideCFLusingGrainmar{L :  CFL  w:  string)  = 

1.  If  L  is  specified  as  a  PDA,  use  PDAtoCFG ,  presented  in  the  proof  of  Theorem 
12.2,  to  construct  a  grammar  G  such  that  L  (G )  =  L  (M ). 

2.  If  L  is  specified  as  a  grammar  G.  simply  use  G. 

3.  If  w  =  e  then  if  Sc  is  nullable  (as  defined  in  the  description  of  remove Eps  in 
Section  1 1.7.4)  then  accept,  otherwise  reject. 

4.  If  w  *  e  then: 

4.1.  From  G. construct  G '  such  that  L  (G ')  =  L  (G)  -  {e}  and  G'  is  in 
Chomsky  normal  form. 

4.2.  If  G  derives  w,  it  does  so  in  2  •  |w|  -  1  steps. Try  all  derivations  in  G  of 
that  number  of  steps.  If  one  of  them  derives  w,  accept.  Otherwise  reject. 

The  running  lime  of  decideCFLusingGranimar  can  be  analyzed  as  follows:  We  as¬ 
sume  that  the  time  required  to  build  G '  is  constant,  since  it  does  not  depend  on  t c.  Let 
n  -  |w»|.  Let  g  be  the  search-branching  factor  of  G defined  to  be  the  maximum  num¬ 
ber  of  rules  that  share  a  left-hand  side. Then  the  number  of  derivations  of  length  2/i  -  1 
is  bounded  by  g2”'1,  and  it  takes  at  most  2/i  —  1  steps  to  check  each  one.  So  the  worst- 
case  running  lime  of  decideCFLusingGranwuir  is  0{t\2").  In  Section  15.3.1,  we  will 
present  techniques  that  are  substantially  more  efficient.  We  will  describe  the  CKY  algo¬ 
rithm,  which,  given  a  grammar  G  in  Chomsky  normal  form,  decides  the  membership 
question  lor  G  in  time  that  is  <9(/i3).  We  will  then  describe  an  algorithm  that  can  decide 
the  question  in  time  that  is  linear  in  n  if  the  grammar  that  is  provided  meets  certain 
requirements. 


THEOREM  14.1  Decidability  of  Context-Free  Languages 

— 

Theorem:  Given  a  context-free  language  L  (represented  as  either  a  context-free 
grammar  or  a  PDA)  and  a  string  w,  there  exists  a  decision  procedure  that  an¬ 
swers  the  question, “Is  we  L?” 

Proof:  The  following  algorithm,  decideCFL .  uses  decideCFLusingGranimar  to 
answer  the  question: 

decideCFL{L:  CFL  w.  string)  = 

I 

1.  If  decideCFLusingGrammar(L,  w)  accepts,  return  True  else  return  False. 
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Using  a  PDA  to  Decide  + 

It  is  also  possible  to  solve  the  membership  problem  using  PDAs.  We  take  a  two-step 
approach.  We  first  show  that,  for  every  context-free  language  L,  it  is  possible  to  build  a 
PDA  that  accepts  L  -  (e\  and  that  has  no  e-transitions.  Then  we  show  that  every 
PDA  with  no  e-transitions  is  guaranteed  to  hall. 


THEOREM  14.2  Elimination  of  ^-Transitions  _ 

Theorem:  Given  any  context-free  grammar  G  =  (V.1,  R.S),  there  exists  a  PDA 
M  such  that  L(M)  =  L  (G)  -  {ef  and  M  contains  no  transitions  of  the  form 
({c/i,c,  a),  (q2,  /3)).  In  other  words,  every  transition  reads  exactly  one  input 
character. 

Proof:  The  proof  is  by  a  construction  that  begins  by  converting  G  to  Greibach  nor¬ 
mal  form.  Recall  that,  in  any  grammar  in  Greibach  normal  form,  all  rules  are  of 
the  form  X—*a  A,  where  a  e  2  and  A  e  ( V  —  2 )*.  Now  consider  again  the  algo¬ 
rithm  cfgtoPDAtopdown ,  which  builds,  from  any  context-free  grammar  G,  a  PDA 
M  that,  on  input  w.  simulates  G  deriving  u\  starting  from  S.  M  =  ({p,q}->  2. 

V,  {</}),  where  A  contains: 

1.  The  start-up  transition  ((/>,  e,  e),  (qt  5)).  which  pushes  the  start  symbol  onto 
the  stack  and  goes  to  state  q. 

2.  For  each  rule  X — s,.v2 . , . sn  in  R .  the  transition  ((q,e,X).  (q.  s,s2 . . .  j„)),  which 
replaces  X  by  $|S2 . . .  s„.  If  n  =  0  (i.e.,  the  right-hand  side  of  the  rule  is  e),  then 
the  transition  ((q,  e,  X),  (q,  e)). 

3.  For  each  character  c  e  2),  the  transition  ((</,  c,  r).  (q.  e)).  which  compares  an 
expected  character  from  the  stack  against  the  next  input  character  and  contin¬ 
ues  if  they  match. 

The  start-up  transition,  plus  all  the  transitions  generated  in  step  2,  are 
e-transitions.  But  now  suppose  that  G  is  in  Greibach  normal  form.  If  G  contains 
the  rule  X-*cs2. ..s„  (where  reS  and  s2  through  s„  are  elements  of  V-2),  it  is 
not  necessary  to  push  c  onto  the  stack,  only  to  pop  it  with  a  rule  from  step  3.  in¬ 
stead,  we  collapse  the  push  and  the  pop  into  a  single  transition.  So  we  create  a 
transition  that  can  be  taken  only  if  the  next  input  character  is  c.  In  that  case,  the 
string  s2 . . .  s„  is  pushed  onto  the  stack. 

Now  we  need  only  find  a  way  to  get  rid  of  the  start-up  transition,  whose  job  is 
to  push  S  onto  the  stack  so  that  the  derivation  process  can  begin.  Since  G  is  in 
Greibach  normal  form,  any  rules  with  S  on  the  left-hand  side  must  have  the  form 
S  -*  cs2 . . .  So  instead  of  reading  no  input  and  just  pushing  S,  M  will  skip  push¬ 
ing  S  and  instead,  if  the  first  input  character  is  c.  read  it  and  push  the  string 
s2  •  •  • 

Since  terminal  symbols  are  no  longer  pushed  onto  the  stack,  we  no  longer 
need  the  transitions  created  in  step  3  of  the  original  algorithm. 
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So  M  -  ({ p,q },  2,  V,  A ,p,  {g}),  where  A  contains: 

1.  The  start-up  transitions:  For  each  rule  S  — *  c$2 . . . sn ,  the  transition  ((p, c, e), 

(q,  S2 .  • .  s„)). 

2.  For  each  rule  X-*cs2-..s„  (where  ce  2  and  s2  through  s„  are  elements  of 
V  -  2 ), the  transition  ((<7, c,.Y), (4, s2...jn)). 

The  following  algorithm  builds  the  required  PDA: 

cfgtoPDAnoeps(G:  context-free  grammar)  = 

1.  Convert  G  to  Greibach  normal  form,  producing  G‘. 

2.  From  G'  build  the  PDA  M  described  above. 


THEOREM  14.3  Halting  Behavior  of  PDAs  Without  e-Transitions _ 

Theorem:  Let  M  be  a  PDA  that  contains  no  transitions  of  the  form 
((<7i,  e.Sj),  (<72,s2)),  i-e-i  no  ^-transitions.  Consider  the  operation  of  M  on  input 
we  2*.  M  must  halt  and  either  accept  or  reject  w.  Let  n  =  |iw|.  We  make  three 
additional  claims: 

a.  Each  individual  computation  of  M  must  halt  within  n  steps. 

b.  The  total  number  of  computations  pursued  by  M  must  be  less  than  or  equal  to  bn, 
where  b  is  the  maximum  number  of  competing  transitions  from  any  state  in  M. 

c.  The  total  number  of  steps  that  will  be  executed  by  all  computations  of  M  is 
bounded  by  nb". 

Proof: 

a.  Since  each  computation  of  M  must  consume  one  character  of  w  at  each  step 
and  M  will  halt  when  it  runs  out  of  input,  each  computation  must  halt  within 
n  steps. 

b.  M  may  split  into  at  most  b  branches  at  each  step  in  a  computation.  The  num¬ 
ber  of  steps  in  a  computation  is  less  than  or  equal  to  n.  So  the  total  number  of 
computations  must  be  less  than  or  equal  to  bn. 

c.  Since  the  maximum  number  of  computations  is  bn  and  the  maximum  length 
of  each  is  n ,  the  maximum  number  of  steps  that  can  be  executed  before  all 
computations  of  M  halt  is  nbH. 


So  a  second  way  to  answer  the  question.  “Given  a  context-free  language  L  and  a 
string  w,  is  w  in  LT  is  to  execute  the  following  algorithm: 

dec  id  e  C  FLusing  PDA  ( L :  CFL,  w:  string)  = 

1.  If  L  is  specified  as  a  PDA,  use  PDAioCFG ,  as  presented  in  the  proof  of  Theo¬ 
rem  12.2,  to  construct  a  grammar  G  such  that  L  (G)  =  L  (M ). 

2.  If  L  is  specified  as  a  grammar  G,  simply  use  G. 
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3.  If  to  =  e  then  if  S(:  is  nullable  (as  defined  in  the  description  of  remove Eps  in 

Section  1 1.7.4)  then  accept,  otherwise  reject. 

4.  If  w  /  e  then: 

4.1.  From  G.  construct  G'  such  that  L  (C)  =  L  (G )  -  {k}  and  G’  is  in 
Greibach  normal  form. 

4.2.  From  G'  construct,  using  cfgioPDAnoeps,  the  algorithm  described  in 
the  proof  of  Theorem  14.2,  a  PDA  AT  such  that  L  (A/’)  *  L  ( G ')  and 
AT  has  no  e-transitions. 

4.3.  By  Theorem  14.3,  all  paths  of  AT  are  guaranteed  to  hall  within  a 
finite  number  of  steps.  So  run  AT  on  u\  Accept  if  AT  accepts  and 
reject  otherwise. 

The  running  time  of  decideC  FLusingPDA  can  be  analyzed  as  follows:  We  will  take 
as  a  constant  the  time  required  to  build  Af\  since  that  can  be  done  once.  It  need  not 
be  repeated  for  each  string  that  is  to  be  analyzed.  Given  AT,  the  time  required  to  an¬ 
alyze  a  string  w  is  then  the  time  required  to  simulate  all  paths  of  AT  on  w.  Let 
n  _  j^i  prom  Theorem  14.3,  we  know  that  the  total  number  of  steps  that  will  be  ex¬ 
ecuted  by  all  paths  of  M  is  bounded  by  nb",  where  b  is  the  maximum  number  of  com¬ 
peting  transitions  from  any  slate  in  AT.  But  is  that  number  of  steps  required?  If  one 
stale  has  a  large  number  of  competing  transitions  but  the  others  do  not,  then  the  av¬ 
erage  branching  factor  will  be  less  than  b,  so  fewer  steps  will  be  necessary.  But  if  b  is 
greater  than  1.  the  number  of  steps  still  grows  exponentially  with  n.  The  exact  num¬ 
ber  of  steps  also  depends  on  how  the  simulation  is  done.  A  straightforward  depth- 
first  search  of  the  tree  of  possibilities  will  explore  b"  steps,  which  is  less  than  nbn 
because  it  does  not  start  each  path  over  at  the  beginning.  But  it  still  requires  time 
that  is  0(bK).  In  Section  15.2.3.  we  present  an  alternative  approach  to  top-down  pars¬ 
ing  that  runs  in  time  that  is  linear  in  n  if  the  grammar  that  is  provided  meets  certain 
requirements. 


14.1.2  Emptiness  and  Finiteness 

While  many  interesting  questions  are  not  decidable  for  context-free  languages,  two 
others,  in  addition  to  membership  are:  emptiness  and  finitencss. 


THEOREM  14.4  Decidability  of  Emptiness  and  Finiteness 

Theorem:  Given  a  context-free  language  L.  there  exists  a  decision  procedure  that 
answers  each  of  the  following  questions: 

1.  Given  a  context-free  language  L,  is  L  =  0? 

2.  Given  a  context-free  language  L.is  L  infinite? 

Since  we  have  proven  that  there  exists  a  grammar  that  generates  L  iff  there 
exists  a  PDA  that  accepts  it.  these  questions  will  have  the  same  answers  whether 
we  ask  them  about  grammars  or  about  PDAs. 
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Proof: 

1.  I -et  G  =  (V,  2,  R,  S)  be  a  context-free  grammar  that  generates  L. 
L(G)  =  0  iff  S  is  unproductive  (i.e.,  not  able  to  generate  any  terminal 
strings).  The  following  algorithm  exploits  the  procedure  removeunproductive , 
defined  in  Section  11.4,  to  remove  all  unproductive  nonterminals  from  G.  It 
answers  the  question,  “Given  a  context-free  language  L,  is  L  =  0?” 

decideCFLempty(G :  context-free  grammar)  = 

1.  Let  G'  =  removeunproductive  (G). 

2.  If  5  is  not  present  in  G'  then  return  True  else  return  False. 

2.  Let  G  =  (K,  2,  R,  S)  be  a  context-free  grammar  that  generates  L.  We  use  an 
argument  similar  to  the  one  that  we  used  to  prove  the  context-free  Pumping 
Theorem.  Let  n  be  the  number  of  nonterminals  in  G.  Let  ft  be  the  branching 
factor  of  G.The  longest  string  that  G  can  generate  without  creating  a  parse  tree 
with  repeated  nonterminals  along  some  path  is  of  length  bn.  If  G  generates  no 
strings  of  length  greater  than  ft",  then  L(G)  is  finite.  If  G  generates  even  one 
string  w  of  length  greater  than  bn,  then,  by  the  same  argument  we  used  to  prove 
the  Pumping  Theorem,  it  generates  an  infinite  number  of  strings  since 
w  =  u  vxyz,  |wy|  >  0,  and  Vq  a  0  ( uvqxyqz  is  in  L).  So  we  could  try  to  test  to 
see  whether  L  is  infinite  by  invoking  decideCFL(L,  w)  on  all  strings  in  2*  of 
length  greater  than  ft".  If  it  returns  True  for  any  such  string,  then  L  is  infinite.  If 
it  returns  False  on  all  such  strings,  then  L  is  finite. 

But.  assuming  2  is  not  empty,  there  is  an  infinite  number  of  such  strings. 
Fortunately,  it  is  necessary  to  try  only  a  finite  number  of  them.  Suppose  that  G 
generates  even  one  string  of  length  greater  than  ft"+1  +  b ".  Let  t  be  the  short¬ 
est  such  string.  By  the  Pumping  Theorem,  t  =  uvxyz ,  |uyl  >  0,  and  uxz  (the 
result  of  pumping  vy  out  once)  e  L.  Note  that  \uxz\  <  |f|  since  some  non¬ 
empty  vy  was  pumped  out  of  /  to  create  it.  Since,  by  assumption,  t  is  the  shortest 
string  in  L  of  length  greater  than  ft"+1  +  bn,  |ujtz|  must  be  less  than  or  equal  to 
b,,+l  +  b".  But  the  Pumping  Theorem  also  tells  us  that  |tuy|  k  (i.e.,  ft"+1 ), 
so  no  more  than  6n+1  strings  could  have  been  pumped  out  of  /.Thus  we  have 
that  bn  <  |mxz|  s  b"+1  +  ft".  So,  if  L  contains  any  strings  of  length  greater 
than  ft",  it  must  contain  at  least  one  string  of  length  less  than  or  equal  to 
ft»i+i  +  fa"  We  can  now  define  decideCFLin finite  to  answer  the  question, 
“Given  a  context-free  language  L,  is  L  infinite?”: 

decideCFLinfinite(G:  context-free  grammar)  = 

1.  Lexicographically  enumerate  all  strings  in  2*  of  length  greater  than 
ft"  and  less  than  or  equal  to  ft"+l  +  ft". 

2.  If,  for  any  such  string  w,  decideCFL(L,  w )  returns  True  then  return 
True.  L  is  infinite. 

3.  If,  for  all  such  strings  w,  decideCFL(L,  w)  returns  False  then  return 
False.  L  is  not  infinite. 
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14.1.3  Equality  of  Deterministic  Context-Free  languages 

THEOREM  14.5  Decidability  of  Equivalence  for  Deterministic  Context- 
Free  Languages  _____ 

Theorem:  Given  two  deierniinisiic  context-free  languages  /. |  and  L2.  there  exists  a 
decision  procedure  to  determine  whether  L\  =  Lz. 

Proof:  This  claim  was  not  proved  until  1997  and  the  proof  |Scnizcrgues  2001  j  is  be¬ 
yond  the  scope  of  this  book,  hut  see  u. 


14.2  The  Undecidable  Questions 

Unfortunately,  wc  will  prove  in  Chapter  22  that  there  exists  no  decision  procedure  for 
many  other  questions  that  we  might  like  to  be  able  to  ask  about  context-free  lan¬ 
guages.  including: 

•  Given  a  context-free  language  L.  is  L  =  £*? 

•  Given  a  context-free  language  L.  is  the  complement  of  L  context-free? 

•  Given  a  context-free  language  L.  is  L  regular? 

•  Given  two  context-free  languages  l.\  and  /.2,  is  Z.|  =  /.2?  (Theorem  14.5  tells  us 
that  this  question  is  decidable  for  the  restricted  case  of  two  deterministic  context- 
free  languages.  But  it  is  undecidable  in  the  more  general  case.) 

•  Given  two  context-free  languages  Lj  and  /-2,  is  /.,  C  /.2? 

•  Given  two  context-free  languages  L\  and  L2.  is  L,  D  L2  ~  0? 

•  Given  a  context-free  language  L.  is  L  inherently  ambiguous? 

•  Given  a  context-free  grammar  G.  is  G  ambiguous? 


14.3  Summary  of  Algorithms  and  Decision  Procedures 
for  Context-Free  Languages 

Although  we  have  presented  few'er  algorithms  and  decision  procedures  for  context- 
free  languages  than  we  did  for  regular  languages,  there  are  many  important  ones, 
which  we  summarize  here: 

•  Algorithms  that  transform  grammars: 

•  renioveiinproductiveiG :  context-free  grammar):  Construct  a  grammar  G'  that 
contains  no  unproductive  nonterminals  and  such  that  L  (G')  =  L  (G). 

•  removeunreachahle{G:  context-free  grammar):  Construct  a  grammar  G'  that 
contains  no  unreachable  nonterminals  and  such  that  />  (G‘)  -  L  (G ). 
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•  remove Eps(G:  context-free  grammar):  Construct  a  grammar  G'  that  contains 
no  rules  of  the  form  X—*e  and  such  that  L  (G')  =  L  (G)  -  {e}. 

•  atnwstoneEps{G:  context-free  grammar):  Construct  a  grammar  G'  that  con¬ 
tains  no  rules  of  the  form  X-*e  except  possibly  S*  — 1 *e,  in  which  case  there 
are  no  rules  whose  right-hand  side  contains  S*.  and  such  that  L(G')  =  L  (G). 

•  converltoChomskyi G:  context-free  grammar):  Construct  a  grammar  G'  in 
Chomsky  normal  form,  where  L  (G')  =  L(G)  —  (e). 

•  converttoG rei boch(G:  context-free  grammar):  Construct  a  grammar  G'  in 
Greibach  normal  form,  where  L  (G')  =  L  (G)  —  {e}. 

•  renu)veUnits(G :  context-free  grammar):  Construct  a  grammar  G'  that  contains 
no  unit  productions,  where  L  (G')  =  L  (G). 

Algorithms  that  convert  between  context-free  grammars  and  PDAs: 

•  cfyloPDAlnptbwn(G :  context-free  grammar):  Construct  a  PDA  M  such  that 
L(M)  =  L(G)  and  M  operates  top-down  to  simulate  a  left -most  derivation  in  G. 

•  cfgtoPDAboitomup(G :  context-free  grammar):  Construct  a  PDA  M  such  that 
L{M)  =  L(G)  and  M  operates  bottom  up  to  simulate,  backwards,  a  right¬ 
most  derivation  in  G. 

•  cfi>toPDAnoeps(G:  context-free  grammar):  Construct  a  PDA  M  such  that  M  con¬ 
tains  no  transitions  of  the  form  ((<7,,  e,  x,).  (<7,.  s2))  and  L(M  )  =  L{G)  -  {e}. 

Algorithms  that  transform  PDAs: 

•  con  vert  PDA  torestri cled  ( M :  PDA):  Construct  a  PDA  M'  in  restricted  normal 
form  where  L(M')  =  L  ( M ). 

Algorithms  that  compute  functions  of  languages  defined  as  context-free  grammars: 

•  Given  two  grammars  G\  and  G-.,  construct  a  new  grammar  G3  such  that 
L  (Gj)  =  L(G,)UL(G2). 

•  Given  two  grammars  G\  and  G2,  construct  a  new  grammar  G3  such  that 
MG3)  =  L(G,)MG2). 

•  Given  a  grammar  G.  construct  a  new  grammar  G'  such  that  L  (G')  =  (L  (G ))*. 

•  Given  a  grammar  G,  construct  a  new  grammar  G*  such  that  L  (G')  =  (L  (G))R. 

•  Given  a  grammar  G,  construct  a  new  grammar  G'  that  accepts  letsub(L(G)), 
where  letsub  is  a  letter  substitution  function. 

Miscellaneous  algorithms  for  PDAs: 

•  intersect PDAandFSM  (Af,:  PDA,  FSM):  Construct  a  PDA  M3  such  that 
L  (Ms)  =  L(M,)nL(M2). 

•  without$(M\  PDA):  If  M  accepts  L$, construct  a  PDA  M'  such  that  L  (A/')  =  L. 

•  complemenhletPDA ( M:  DPDA):  If  M  accepts  L%,  construct  a  PDA  M'  such 
that  L(M')  = 
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•  Decision  procedures  that  answer  questions  about  context-free  languages: 

•  decidcC  FLusingP DA ( CF1 ..  ir :  string):  Decide  w  helhcr  tr  is  in  L. 

•  decidcC FLusinxGrimmutr{L\  CFL.  ir.  siring):  Decide  whether  w  is  in  L. 

•  decidcC FL{  L:  CFL.  ir:  siring):  Decide  whether  ir  is  in  /.. 

•  decidcC  FLenipt\(G\  context-free  grammar):  Decide  whether  1.  ( G )  -  0. 

•  decidcC FLinfinite[ G: context-free  grammar):  Decide  whether  /.(O’)  is  infinite.j 


Exercises 

1.  Give  a  decision  procedure  to  answer  each  of  the  following  questions: 

a.  Given  a  regular  expression  a  and  a  PDA  A/,  is  the  language  accepted  by  M  a 
subset  of  the  language  generated  by  «'? 

b.  Given  a  context-free  grammar  G  and  two  strings  v,  and  v*.  does  O’  generate 

S|.Vi? 

c.  Given  a  context-free  grammar  O’,  does  O’  generate  at  least  three  strings? 

d.  Given  a  context-free  grammar  O’,  does  O’  generate  any  even  length  strings? 

e.  Given  a  regular  grammar  G.  is  L(G)  context-free? 


CHAPTER 


15 


Context-Free  Parsing* 


Programming  languages  are  (mostly)  context-free.  Query  languages  are  usually 
context-free.  English  can,  in  large  part,  be  considered  context-free.  Strings  in 
these  languages  need  to  be  analyzed  and  interpreted  by  compilers,  query  en¬ 
gines,  and  various  other  kinds  of  application  programs.  So  we  need  an  algorithm  that 
can,  given  a  context-free  grammar  G: 

1.  Examine  a  string  and  decide  whether  or  not  it  is  a  syntactically  well-formed 
member  of  L(G).  and 

2.  If  it  is,  assign  to  it  a  parse  tree  that  describes  its  structure  and  thus  can  be  used  as 
the  basis  for  further  interpretation. 


Are  programming  languages  really  context-free?  (G.2) 


In  Section  14.1.1,  we  described  two  techniques  that  can  be  used  to  construct,  from  a 
grammar  G.  a  decision  procedure  that  answers  the  question,  “Given  a  string  tv,  is  w  in 
L(G)?”  But  we  aren’t  done.  We  must  still  deal  with  the  following  issues: 

•  The  first  procedure,  decideCFLusingGrammar,  requires  a  grammar  that  is  in 
Chomsky  normal  form.  The  second  procedure,  decideCFLusingPDA ,  requires  a 
grammar  that  is  in  Greibach  normal  form.  We  would  like  to  use  a  natural  gram¬ 
mar  so  that  the  parsing  process  can  produce  a  natural  parse  tree. 

•  Both  procedures  require  search  and  take  time  that  grows  exponentially  in  the 
length  of  the  input  string.  But  we  need  efficient  parsers,  preferably  ones  that  run  in 
time  that  is  linear  in  the  length  of  the  input  string. 

•  All  either  procedure  does  is  to  determine  membership  in  L(G).  It  does  not  produce 
parse  trees. 


Query  languages  are  context-free.  (Q.l.l) 
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In  this  chapter  we  will  sketch  solutions  to  all  of  these  problems.  The  discussion  will 

be  organized  as  follows: 

•  Easy  issues: 

•  Actually  building  parse  trees:  All  of  the  parsers  we  vs  ill  discuss  work  by  applying 
grammar  rules.  So.  to  build  a  parse  tree,  it  suffices  to  augment  the  parser  with  a 
function  that  builds  a  chunk  of  tree  every  time  a  rule  is  applied. 

•  Using  lookahead  to  reduce  nondeterminisin:  It  is  often  possible  to  reduce  (or 
even  eliminate)  nondeiemtinism  by  allowing  the  parser  to  look  ahead  at  the 
next  one  or  more  input  symbols  before  it  makes  a  decision  about  what  to  do. 

•  Lexical  analysis:  a  preprocessing  step  in  which  strings  of  individual  input  characters 

are  divided  into  strings  of  larger  units,  called  tokens,  that  can  be  input  to  a  parser. 

•  Top-down  parsers: 

•  A  simple  but  inefficient  recursive  descent  parser. 

•  Modifying  a  grammar  for  top-down  parsing. 

•  LL  parsing. 

•  Bottom-up  parsers: 

•  The  simple  but  not  efficient  enough  Coeke-Kasami- Younger  (C'KY)  algorithm. 

•  LR  parsing. 

•  Parsers  for  English  and  other  natural  languages. 

As  we'll  see.  the  bottom  line  on  the  efficiency  of  context-free  parsing  is  the  follow¬ 
ing.  Let  n  be  the  length  of  the  siring  to  be  parsed. Then: 

•  There  exists  a  straightforward  algorithm  (C'KY)  that  can  parse  any  context-free 
language  in  0(/r')  time.  While  this  is  substantially  better  than  the  exponential 
time  required  to  simulate  the  kind  of  nondelerministic  PDAs  that  we  built  in 
Section  12.3.  it  isn't  good  enough  for  many  practical  applications.  In  addition. 
CKY  requires  its  grammar  to  be  in  Chomsky  normal  form.  Iliere  exists  a  much 
less  straightforward  version  of  CKY  that  can  parse  any  context-free  language  in 
close  to  0(/r)  time. 

•  There  exist  algorithms  that  can  parse  large  subclasses  of  context-free  languages 
(including  many  of  the  ones  we  care  about,  like  most  programming  languages  and 
query  languages)  in  O(n)  lime.  There  are  reasonably  straightforward  top-down 
algorithms  that  can  be  built  by  hand. There  are  more  efficient,  more  complicated 
bottom-up  ones.  But  there  exist  tools  that  make  building  practical  botlom-up 
parsers  very  easy. 

•  Parsing  English,  or  any  other  natural  language,  is  harder  than  parsing  most  arti¬ 
ficial  languages,  which  can  be  designed  with  parsing  efficiency  in  mind. 
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level  =  observation  -  17.5; 

(a) 

QH  ED  QH  □  DU  CD 

^  FIGURE  15.1  Lexical  analysis. 

15.1  Lexical  Analysis 

Consider  the  input  siring  shown  in  Figure  15.1  (a).  It  contains  27  characters,  including 
blanks.  The  job  of  lexical  analysis  is  to  convert  it  into  a  sequence  of  symbols  like  the 
one  shown  in  Figure  15.1  (b). 

We  call  each  of  the  symbols  that  the  lexical  analyzer  produces  a  token.  So,  in  this 
simple  example,  there  are  6  tokens.  In  addition  to  creating  a  token  stream,  the  lexical 
analyzer  must  be  able  to  associate,  with  each  token,  some  information  about  how  it  was 
formed.  That  information  will  matter  when  it  comes  time  to  assign  meaning  to  the 
input  string  (for  example  by  generating  code). 

In  principle,  we  could  skip  lexical  analysis.  We  could  instead  extend  every  grammar  to 
include  the  rules  by  which  simple  constituents  like  identifiers  and  numbers  are  formed. 


EXAMPLE  15.1  Specifying  id  with  a  Grammar 

We  could  change  our  arithmetic  expression  grammar  (from  Example  1 1.19)  so  that 
id  is  a  nonterminal  rather  than  a  terminal.  We’d  then  have  to  add  rules  such  as: 

id  —*  identifier  \  integer  |  float 

identifier  —*  letter  alphanum  /*  a  letter  followed  by  zero  or 

more  alphanumerics. 

alphanum  —*  letter  alphnum  |  digit  alplmum  |  e 

integer  — *•  -unsignedint  \  unsignedint  /*  an  optional  minus  sign 

followed  by  an  unsigned 
integer. 

unsignedint  — * digit  \  digit  unsignedint 
digit  — *0|  1  |2|3|4|5|6|7|8|9 


But  there  is  an  easier  way  to  handle  this  early  part  of  the  parsing  problem.  We  can 
write  regular  expressions  that  define  legal  identifiers  and  numbers.  Those  regular  ex¬ 
pressions  can  then  be  compiled  into  deterministic  finite  state  machines,  which  can  run 
in  time  that  is  linear  in  the  length  of  the  input. 
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Useful  tools  for  building  lexical  analyzers,  also  called  lexers.  Q  arc  widely  available. 
Lex  Q  is  a  good  example  of  such  a  tool.  The  input  to  Lex  is  a  set  of  rules.  The  left-hand 
side  of  each  rule  is  a  regular  expression  that  describes  the  input  strings  to  which  the 
rule  should  apply.  The  right-hand  side  of  each  rule  (enclosed  in  curly  brackets)  de¬ 
scribes  the  output  that  should  be  created  whenever  the  rule  matches.  The  output  of  Lex 
is  a  lexical  analyzer.  When  the  analyzer  runs,  it  matches  its  rules  against  an  input 
stream.  Any  text  that  is  not  matched  by  any  rule  is  simply  echoed  back  into  the  output 
stream.  Any  text  that  is  matched  is  replaced  in  the  output  stream  by  the  right-hand  side 
of  the  matching  rule.  The  analyzer  assumes  a  specific  pattern  of  run-time  communica¬ 
tion  between  itself  and  a  context-free  parser  to  which  it  will  be  streaming  tokens.  In 
particular,  it  assumes  the  existence  of  a  few  shared  variables,  including  one  called 
yylval.  into  which  the  value  that  corresponds  to  the  current  token  can  be  placed. 


EXAMPLE  15.2  Some  Simple  Lex  Rules 

Here  are  some  simple  Lex  rules: 


1.  [  \t]+; 


/*  Get  rid  of  blanks 
and  labs. 


2.  [A-Za-z]  [A-Za-zO-9]*  {  return(ID) ;  }  /*  Find  identifiers. 

3.  [0-9]+  {  sscanfCyytext,  "Xd",  Ayylval): 

return  (INTEGER) ;  >  /*  Return  INTEGER 

and  pul  the  value 
in  yylval. 

•  Rule  1  has  just  a  left-hand  side,  which  matches  any  string  composed  of  just 
blanks  and  tabs.  Since  it  has  an  empty  right-hand  side,  the  string  it  matches  will 
be  replaced  by  the  empty  siring.  So  it  could  be  used  to  get  rid  of  blanks  and 
tabs  in  the  input  if  their  only  role  is  as  delimiters.  In  this  case,  they  will  not  cor¬ 
respond  to  any  symbols  in  the  grammar  that  the  parser  will  use. 

•  Rule  2  has  a  left-hand  side  that  can  match  any  alphanumeric  siring  that  starts 
with  a  letter.  Any  substring  it  matches  will  be  replaced  by  the  value  of  its  right- 
hand  side,  namely  the  token  id.  So  this  rule  could  be  used  to  find  identifiers. 
But  since  no  information  about  what  identifier  was  found  is  recorded,  this  rule 
is  too  simple  for  most  applications. 

•  Rule  3  could  be  used  to  find  integers.  It  returns  the  token  INTEGER.  But  it  also 
places  the  specific  value  that  it  matched  into  the  shared  variable  yylval. 


If  two  Lex  rules  match  against  a  single  piece  of  input  text,  the  analyzer  chooses 
between  them  as  follows: 

•  A  longer  match  is  preferred  over  a  shorter  one. 

•  Among  rules  that  match  the  same  number  of  input  characters,  the  one  that  was 
written  first  in  the  input  to  Lex  is  preferred. 
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EXAMPLE  15.3  How  Lex  Chooses  Which  Rule  to  Apply 

Suppose  that  Lex  has  been  give  the  following  two  rules: 

1.  integer  {action  1} 

2.  [a-z]+  {action  2} 

Now  consider  what  the  analyzer  it  builds  will  do  on  the  following  input 
sequences: 

i  ntegers  take  action  2  because  rule  (2)  matches  the  entire  string 

integers,  while  rule  (1)  matches  only  the  first  7 
characters. 

integer  take  action  1  because  both  patterns  match  all  7  charac¬ 

ters  and  rule  (1)  comes  first. 


Lex  was  specifically  designed  as  a  tool  for  building  lexical  analyzers  to  work  with 
parsers  generated  with  the  parser-building  tool  Yacc,  which  we  will  describe  in 
Section  15.3. 

1 5.2  Top-Down  Parsing 

A  top-down  parser  for  a  language  defined  by  a  grammar  G  works  by  creating  a  parse 
tree  with  a  root  labeled  Sc ■  It  then  builds  the  rest  of  the  tree,  working  downward  from 
the  root,  using  the  rules  in  Rq-  Whenever  it  creates  a  node  that  is  labeled  with  a  termi¬ 
nal  symbol,  it  checks  to  see  that  it  matches  the  next  input  symbol.  If  it  does,  the  parser 
continues  until  it  has  built  a  tree  that  spans  the  entire  input  string.  If  the  match  fails,  the 
parser  terminates  that  path  and  tries  an  alternative  way  of  applying  the  rules.  If  it  runs 
out  of  alternatives,  it  reports  failure.  For  some  languages,  described  with  certain  kinds 
of  grammars,  it  is  possible  to  do  all  of  this  without  ever  having  to  consider  more  than 
one  path,  generally  by  looking  one  character  ahead  in  the  input  stream  before  a  deci¬ 
sion  about  what  to  do  next  is  made.  We’ll  begin  by  describing  a  very  general  parser  that 
conducts  a  depth-first  search  and  typically  requires  backtracking.  Then  we’ll  consider 
grammar  restrictions  that  may  make  deterministic  top-down  parsing  possible. 

15.2.1  Depth-First  Search 

We  begin  by  describing  a  simple  top-down  parser  that  works  in  essentially  the  same 
way  that  the  top-down  PDAs  that  we  built  in  Section  12.3.1  did.  It  attempts  to  recon¬ 
struct  a  left-most  derivation  of  its  input  string.  The  only  real  difference  is  that  it  is  now 
necessary  to  describe  how  nondeterminism  will  be  handled.  We’ll  use  depth-first 
search  with  backtracking.  The  algorithm  that  we  are  about  to  present  is  similar  to 
decidcCFLiisingPDA.  They  are  both  nondeterministic,  top-down  algorithms.  But  the 
one  we  present  here,  in  contrast  to  decideCFLusingPDA ,  does  not  require  a  grammar 
in  any  particular  form. 
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EXAMPLE  15.4  Top-Down,  Depth-First  Parsing 

To  see  how  a  how  a  depth-first,  top-down  parser  works,  let's  consider  an  English 
grammar  that  is  even  simpler  than  the  one  we  used  in  Example  1 1 .6. This  time,  we 
will  require  that  every  sentence  end  with  the  end-of-string  marker  $: 

S  -  NP  VPS 

NP  -*  the  N  \  N  \  ProperNoun 
N  — *  cat  |  dogs  |  bear  |  girl  |  chocolate  |  rifle 
ProperNoun  — *  Chris  |  Fluffy 
VP  -*  V  \  V  NP 

V  —  like  |  likes  |  thinks  |  shot  |  smells 

On  input  the  cat  likes  chocolate  $.  the  parser,  given  these  rules,  will  be¬ 
have  as  follows: 

•  Build  an  5  using  the  only  rule  available: 


s 


NP  VP  S 


•  Build  an  NP.  Start  with  the  first  alternative,  which  successfully  matches  the 
first  input  symbol: 


S 


NP  VP  S 


the  N 


•  Build  an  N.  Start  with  the  first  alternative,  which  successfully  matches  the  next 
input  symbol: 


cat 


the 
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•  Build  a  VP.  Start  with  the  first  alternative: 


S 


NP 


the  N 


VP 

I 

V 


$ 


cat 


•  Build  a  V.  The  first  alternative,  1  i  Ice,  fails  to  match  the  input.  The  second, 
likes,  matches: 


•  Match  $.  This  fails,  since  the  word  chocolate  remains  in  the  input.  So  the 
process  undoes  decisions,  in  order,  until  it  has  backtracked  to: 


•  Build  a  VP.  This  time,  try  the  second  alternative: 


cat 
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•  Conlinue  until  a  tree  that  spans  the  entire  input  has  been  built: 


J 


NP 


S 


the  N  V  NP 


cat  likes  N 


chocolate 


While  parsers  such  as  this  are  simple  to  define,  there  are  two  problems  with  using 
them  in  practical  situations: 

•  It  is  possible  to  get  into  an  infinite  loop,  even  when  there  is  a  correct  parse  for  the 
input. 

•  Backtracking  is  expensive.  Some  constituents  may  be  built  and  unbuilt  many 
times.  For  example,  the  constituent  V  -  likes  was  built  twice  in  the  simple  sen¬ 
tence  shown  in  Example  15.4.  We'll  illustrate  this  problem  on  a  larger  scale  in  the 
next  example. 


EXAMPLE  15.5  Subtrees  May  Be  Built  and  Discarded  Many  Times 

Suppose  wc  have  the  following  rules  for  noun  phrases: 

NP  — *  the  Nominal  \  Nominal  \  ProperNonn  \  NP  PP 
Nominal  — *  N  |  Atljs  N 

Adjs  -*  Atlv  Atljs  |  Atljs  and  Adjs  \  Adj  Adjs  \  Adj 

N  -*  student  |  raincoat 

Adj  -*  tall  |  self-possessed  |  green 

Adv  —  strikingly 

PP  — *  Prep  NP 

Prep  -*  with 

Now  consider  the  noun  phrase  the  strikingly  tall  and  self-possessed 
student  with  the  green  raincoat.  In  an  attempt  to  parse  (his  phrase  as  an 
NP,  a  depth-first,  top-down  parser  will  first  try  to  use  the  rule  NP—* »the 
Nominal.  In  doing  so.  it  will  build  the  tree: 
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Adj  Adj 

I  I 

tall  self-possessed 


Then  it  will  notice  that  four  symbols,  wi th  the  green  raincoat,  remain.  At 
that  point,  it  will  have  to  back  all  the  way  up  to  the  top  NP  and  start  over,  this  time 
using  the  rule  NP—*NP  PP.  It  will  eventually  build  an  NP  that  spans  the  entire 
phrase*’.  But  that  NP  will  have,  as  a  constituent,  the  one  we  just  built  and  threw 
away.  So  the  entire  tree  we  showed  above  will  have  to  be  rebuilt  from  scratch. 


Because  constituents  may  be  built  and  rebuilt  many  times,  the  depth-first  algorithm 
that  we  just  sketched  may  take  time  that  is  0(g"),  where  g  is  the  maximum  number  of 
alternatives  for  rewriting  any  nonterminal  in  the  grammar  and  n  is  the  number  of  input 
symbols. 

Both  the  problem  of  infinite  loops  and  the  problem  of  inefficient  rebuilding  of  con¬ 
stituents  during  backtracking  can  sometimes  be  fixed  by  rearranging  the  grammar 
and/or  by  looking  ahead  one  or  more  characters  before  making  a  decision.  In  the  next 
two  sections  we'll  see  how  this  may  be  done. 

15.2.2  Modifying  a  Grammar  for  Top-Down  Parsing 

Some  grammars  are  better  than  others  for  top-down  parsing.  In  this  section  we  consid¬ 
er  two  issues,  preventing  the  parser  from  getting  into  an  infinite  loop  and  using  looka¬ 
head  to  reduce  nondeterminism. 


Left-Recursive  Rules  and  Infinite  Loops 

A  top-down  parser  can  gel  into  an  infinite  loop  and  fail  to  find  a  complete  parse  tree, 
even  when  there  is  one. 


"A  separate  issue  is  that  this  phrase  is  ambiguous.  We’ve  shown  the  parse  that  corresponds  to  the  bracketing 
strikingly  (tal  1  and  self-possessed).  An  alternative  parse  corresponds  to  the  bracketing  (strikingly 
tall)  and  self-possessed. 
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EXAMPLE  15.6  Backtracking  Gets  Stuck  on  Left-Recursive  Rules 

We  consider  again  the  term/factor  grammar  for  arithmetic  expressions  shown 
in  Example  11.19: 

E  ->  £+T 
E  -*  T 
T  —  T*F 
T  -*  F 
E  —  (£) 

F—  id 

On  input  id  +  id  +  id.  a  top-down  parser  will  behave  as  follows: 


•  Build  an  £,  using  the  first  alternative: 


E 


E 


7 


•  Build  an  E .  using  the  first  alternative: 


E 


E  +  T 


/K 

e  +  r 


•  Build  an  £.  using  the  first  alternative,  and  so  forth,  forever,  expanding  the  left¬ 
most  E  as  E  +  T. 


The  problem  is  the  existence  in  the  grammar  of  left -recursive  rules  like  E  —*  E  +  T 
and  7'—*  T*  F.  Paralleling  the  definition  we  gave  in  Section  1 1.2  lor  a  recursive  rule,  we 
say  that  a  grammar  rule  is  left-recursive  iff  it  is  of  the  form  .V  — *  Yir*  and  V'  =*<;*  XlCA, 
where  and  wA  may  be  any  element  of  V*.  If  the  rules  were  rewritten  so  that  the 
recursive  symbols  were  on  the  right  of  the  right-hand  side  rather  than  on  the  left  of  it. 
the  parser  would  be  able  to  make  progress  and  consume  the  input  symbols. 

We  first  consider  direct  recursion,  i.e..  rules  of  the  form  .V  — •  -Yus.  litis  case  includes 
the  rules  £  — *  £  +  7* and  T  — *  7  *  F.  Suppose  that  such  a  rule  is  used  to  derive  a  string 
in  L(G).  For  example,  let's  use  the  rule  E  — *  E  +  /'.  I1ien  there  is  a  derivation  that 
looks  like: 

E=*  E  +  T=>E+  T  +  7  =>  ...  =»  T  +  T ...  +  /  =»  *••. 

In  other  words,  the  left -recursive  rule  is  applied  some  number  of  times  but  then  the 
recursion  stops  and  some  nonrecursive  rule  with  the  same  left-hand  side  is  applied.  In 
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this  example,  it  was  the  rule  E—*T. The  left  most  symbol  in  the  string  we  just  derived 
came  from  the  nonrecursive  rule.  So  an  alternative  way  to  generate  that  string  would 
be  to  generate  that  leftmost  T  first,  by  applying  once  a  new  rule  E—*TE'.  Then  we 
can  generate  the  rest  by  applying,  as  many  times  as  necessary,  a  new  recursive  (but  not 
left-recursive)  rule  E'  —*  +7 followed  by  a  clean-up  rule  £'  -*e,  which  will  stop 
the  recursion. 

Applying  this  idea  to  our  arithmetic  expression  grammar,  we  get  the  new  grammar: 

E  —  TE‘ 

E'  ->  +  T  E' 

E'  —  a 
7  —  FT 
T  —  *  FT 
T  —  6 
F  —  (E) 

F  —  id 

We  can  describe  what  we  just  did  more  generally  as  follows:  Given  any  context-free 
grammar  G,  if  G  contains  any  left-recursive  rule  with  left-hand  side  A,  then  consider  all 
rules  in  G  with  left-hand  side  A.  Divide  them  into  two  groups,  the  left-recursive  ones 
and  the  others.  Replace  all  of  them  with  new  rules,  as  shown  in  Table  15.1. 

If.  in  addition  to  removing  left-recursion,  we  want  to  avoid  introducing  e-rules,  we 
can  use  a  variant  of  this  algorithm.  Instead  of  always  generating  A'  and  then  erasing 
it  at  the  end  of  the  recursive  part  of  a  derivation,  we  create  rules  that  allow  it  not  to 
be  generated.  So  we  replace  each  original  left-recursive  rule.  A  —*  Aa k,  with  two  new 
rules:  A'  —*■  akA  and  A'  — *  ak.  Instead  of  replacing  each  original  nonleft-recursive 
rule,  A  — *  /3fc.  we  keep  it  and  add  the  new  rule:  A  — *  /3 kA'.  We  do  not  add  the  rule 
A'  — *  e.  Because  we  will  have  another  use  for  this  variant  of  the  algorithm  (in  con¬ 
verting  grammars  to  Greibach  normal  form),  we  will  give  it  the  name 
removeleftrecursion(N:  nonterminal  symbol). 


Table  15.1  Eliminating  left-recursive  rules. 

Original  left-recursive  rules: 

Replace  with: 

A  —*  /t«j 

A'  — » a2A' 

A  Act! 

A  —•  A(tn 

A'  -*  a„A' 

A'  —*e 

Original  nonleft-recursive  rules: 

Replace  with: 

A- fit 

A^ptA' 

A  —  (j2 

A  -*  fi2A' 

A  *  Pm 

A  P„,A' 
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Unfortunately,  the  technique  that  wc  have  just  presented,  while  it  does  eliminate 
direct  left  recursion,  does  not  solve  the  entire  problem.  Consider  the  following 
grammar  G : 

S  — «  Fa 

Y  ->  Sa 

Y  -  e 

G  contains  no  directly  left-recursive  rules.  But  G  does  contain  left-recursive  rules, 
and  a  top-down  parser  that  used  G  would  gel  stuck  building  the  infinite  left-most 

derivation  S=*Ya  =*£aa  Faaa  =>  Saaaa  => _ It  is  possible  to  eliminate  this 

kind  of  left-recursion  as  well  by  using  an  algorithm  that  loops  through  all  the  nonter¬ 
minal  symbols  in  G  and  applies  the  algorithm  that  we  just  presented  to  eliminate  direct 
left-recursion. 

So  left-recursion  can  be  eliminated  and  the  problem  of  infinite  looping  in  top-down 
parsers  can  be  solved.  Unfortunately,  the  elimination  of  left-recursion  comes  at  a  price. 
Consider  the  input  string  id  +  id  +  id.  Figure  15.2(a)  shows  the  parse  trees  that  will 
be  produced  for  it  using  our  original  expression  grammar.  Figure  15.2(b)  shows  the 
new  one.  with  no  left-recursive  rules. 

Notice  that,  in  the  original  parse  tree,  the  +  operator  associates  left,  while  in  the 
new  parse  tree  it  associates  right.  Since  the  goal  of  producing  a  parse  tree  is  to  serve  as 
the  first  step  toward  assigning  meaning  to  the  input  string  (for  example,  by  writing 
code  to  correspond  to  it),  this  change  is  significant.  In  order  to  solve  our  parsing  prob¬ 
lem.  we’ve  changed  the  meaning  of  at  least  some  strings  in  the  language  we  are  trying 
to  parse. 


id 


Using  the  original  rules: 

E 


£  T 

I 

/ 

f  7*  + 

| 

1 

T 

1 

• 

'  / 

*  h 

1 

F 

id 


id  id 


id 


A  I 

FT’  b 


id 


FIGURE  15.2  Removing  left-recursion  leads  to  different  parse  trees. 
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Using  Lookahead  and  Left  Factoring  to  Reduce  Nondeterminism 

As  we  saw  in  Example  15.4  (the  simple  English  example),  a  depth-first,  top-down  parser 
may  have  to  explore  multiple  derivations  before  it  finds  the  one  that  corresponds  to  the 
current  input  string.  The  process  we  just  described  for  getting  rid  of  left-recursive  rules 
does  nothing  to  affect  that.  So  we  would  still  like  to  find  a  technique  for  reducing  or 
eliminating  the  need  for  search.  Sometimes  it  is  possible  to  analyze  a  grammar  in  ad¬ 
vance  and  determine  that  some  paths  will  never  lead  to  a  complete  derivation  of  a 
string  of  terminal  symbols.  But  the  more  important  case  arises,  as  it  did  even  with  our 
very  simple  grammar  of  English,  when  the  correct  derivation  depends  on  the  current 
input  string.  When  that  happens,  the  best  source  of  guidance  for  the  parser  is  the  input 
string  itself  and  its  best  strategy  is  to  procrastinate  branching  as  long  as  possible  in 
order  to  be  able  to  use  the  input  to  inform  its  decisions.  To  implement  this  strategy, 
we'll  consider  doing  two  things: 

•  Changing  the  parsing  algorithm  so  that  it  exploits  the  ability  to  look  one  symbol 

ahead  in  the  input  before  it  makes  a  decision  about  what  to  do  next,  and 

•  Changing  the  grammar  to  help  the  parser  procrastinate  decisions. 

We  can  explore  both  of  these  issues  by  considering  just  a  fragment  of  our  arithmetic- 
expression  grammar,  which  we’ll  augment  with  one  new  rule  that  describes  simple  func¬ 
tion  calls.  So  consider  just  the  following  set  of  three  rules: 

t.  F-*(E) 

2.  F-*  id 

3.  F-*  id(£)  /*  This  is  a  new  rule  that  describes  a  call  to  a  unary  function. 

If  a  top-down  parser  needs  to  expand  a  node  labeled  £,  which  rule  should  it  use?  If  it 
can  look  one  symbol  ahead  in  the  input  before  deciding,  it  can  choose  between  rule  1, 
which  it  should  apply  if  the  next  character  is  (,  and  rules  2  and  3,  one  of  which  should  be 
applied  if  the  next  symbol  is  i  d. 

But  how  can  a  parser  choose  between  rules  2  and  3  if  it  can  look  only  one  symbol 
ahead?  The  answer  is  to  change  the  grammar  so  that  the  decision  can  be  procrastinated. 
In  particular,  we  will  rewrite  the  grammar  as: 

L  F-*(E) 

1.1.  id* 

2.  X-*e 

3.  *-(£) 

Now,  if  the  lookahead  symbol  is  id,  the  parser  will  apply  rule  1.1. Then  it  will  match 
the  id  and  set  the  lookahead  symbol  to  the  following  symbol.  Next  it  must  decide 
whether  to  expand  X  by  rule  2  or  rule  3.  But  it  is  one  symbol  farther  along  in  the  input 
as  it  faces  this  decision.  If  the  next  input  symbol  is  (,it  is  possible  that  either  rule  should 
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he  chosen  (although  see  below  for  an  additional  technique  that  may  resolve  this  con* 
flict  as  well).  But  if  the  next  input  symbol  is  anything  else,  only  rule  2  can  possibly  lead 
to  a  complete  parse. 

The  operation  that  we  just  did  is  called  left  factoring.  It  can  be  described  as  follows. 
Let  G  be  a  context-free  grammar  that  contains  two  or  more  rules  with  the  same  left- 
hand  side  and  the  same  initial  sequence  of  symbols  on  the  right-hand  side.  Suppose 
those  rules  are: 

A  —  o/3, 

A  —*■  afi: 

A  -*  a(3„ 

where  a  #  e  and  n  a  2.  We  remove  those  rules  from  Ci  and  replace  them  with  the 
rules: 


A  —  aA' 
A  -*  Pi 
A ■  -  Pj 


A '  —  P„ 

A  parser  that  uses  this  new-  grammar  will  still  have  to  make  a  decision  about  what  to 
do  after  it  has  read  the  input  sequence  or.  But  it  will  probably  be  farther  along  in  the 
input  siring  by  the  time  it  has  to  do  that. 


15.2.3  Deterministic  Top-Down  Parsing  with  LL(1)  Grammars 

Can  we  do  better?  We  know,  from  Theorem  13.13.  that  there  exist  context-free  languages 
for  which  no  deterministic  PDA  exists.  So  there  will  be  context-free  languages  for  which 
the  techniques  that  wc  have  just  described  will  not  be  able  to  remove  all  sources  of  non¬ 
determinism.  But  do  we  care  about  them?  Could  we  build  a  deterministic,  linear-lime 
parser  for: 

1.  A  typical  programming  language  like  Java  or  C++  or  Haskell? 

2.  A  typical  database  query  language? 

3.  A  typical  Web  search  query  language? 

4.  English  or  Chinese? 


The  answer  to  questions  I  through  3  is  yes. The  answer  to  question  4  is  mostly  no.  al¬ 
though  there  have  been  some  partially  successlul  attempts  to  do  so.  Using  techniques 
such  as  the  ones  we  just  described,  it  is  sometimes  possible  to  craft  a  grammar  for 
which  a  deterministic  top-down  parser  exists.  Such  parsers  arc  often  also  called 
predictive parsers.Ta  simplify  the  rest  of  this  discussion,  assume  that  every  input  string 
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ends  with  the  end-of-slring  marker  S.This  means  that,  until  after  the  $  is  reached,  there 
is  always  a  next  symbol,  which  we  will  call  the  lookahead  character. 

It  will  be  possible  to  build  a  predictive  top-down  parser  for  a  grammar  G  precisely 
in  case  every  siring  that  is  generated  by  G  has  a  unique  left-most  derivation  and  it  is 
possible  to  determine  each  step  in  that  derivation  by  looking  ahead  some  fixed  number 
k  of  characters  in  the  input  stream.  In  this  case,  we  say  that  G  is  LLf&).  so  named  be¬ 
cause  an  LL(Ar)  grammar  allows  a  predictive  parser  that  scans  its  input  left  to  right  (the 
origin  of  the  first  L  in  the  name)  to  build  a  left-most  derivation  (the  origin  of  the  sec¬ 
ond  L)  if  it  is  allowed  k  lookahead  symbols.  Note  that  every  LL(&)  grammar  is  unam¬ 
biguous  (because  every  string  it  generates  has  a  unique  left-most  derivation).  It  is  not 
the  case,  however,  that  every  unambiguous  grammar  is  LL (k). 

Most  predictive  parsers  use  a  single  lookahead  symbol.  So  we  are  interested  in  de¬ 
termining  whether  or  not  a  grammar  G  is  LL(l).To  do  so,  it  is  useful  to  define  two 
functions: 

•  Given  a  grammar  G  and  a  sequence  of  symbols  a,  define  first(a)  to  be  the  set  of  all 
terminal  symbols  that  can  occur  as  the  first  symbol  in  any  string  derived  from  a 
using  Ra.  If  a  derives  e,  then  e  e  first(a). 

•  Given  a  grammar  G  and  a  nonterminal  symbol  A,  define  follow(A)  to  be  the  set  of 
all  terminal  symbols  that  can  immediately  follow  whatever  A  produces  in  some 
string  in  L(G). 


EXAMPLE  15.7  Computing  First  and  Follow 

Consider  the  following  simple  grammar  G: 

5  —  AXB% 

A  — *  a/1  |  e 
X  —  c  |  e 
B  —  bfl  |  e 

•  first(S)  =  {a,  c,b,  $}. 

•  first(A)  =  {a,e}. 

•  first(AX)  =  {a,  c,e}. 

•  first(AXB)  =  {a,  c,b,  e}. 

•  follow(S)  =  0. 

•  follow(A)  =  {c,b.$}. 

•  follow(X)  =  {b,$}. 

•  follow(B)  =  {$}. 
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We  can  now  state  the  conditions  under  which  a  grammar  G  is  LL(  1 ).  It  is  iff,  when¬ 
ever  G  contains  two  competing  rules  A  -*  a  and  A—*  (3,  all  of  the  following  are  true: 

•  There  is  no  terminal  symbol  that  is  an  element  of  both  first(a)  and  first((3). 

•  e.  cannot  be  derived  from  both  a  and  j3. 

•  If  £  can  be  derived  from  one  of  a  or  0.  assume  it  is  a.  Then  there  may  be  two  com¬ 
peting  derivations: 

5  =»  y\  A  y2  and  S  =*yx  Ay> 

=>  7\  a  y,  =>  Yi  0  Y: 

=*Yi  72 

Consider  the  information  available  to  a  predictive  parser  when  il  has  to  choose 
how  to  expand  A.  It  has  consumed  the  input  up  through  y\.  So.  when  it  looks  one 
character  ahead,  it  will  find  the  first  character  of  y2  (in  case  A  =»  a  =■*  e)  or  it  will 
find  the  first  character  of  /3  (in  case  A=*  (3).  So  we  require  that  there  be  no  termi¬ 
nal  symbol  that  is  an  element  of  both  follow(A)  (which  describes  the  possible  first 
terminal  symbols  in  y2)  and  firsi{f3). 

We  define  a  language  to  be  LL (k)  iff  there  exists  an  LL(/c)  grammar  for  it.  Not  all 
context-free  languages  are  LL(Jt)  for  any  fixed  k.  In  particular,  no  inherently  ambigu¬ 
ous  one  is.  since  every  LL(A')  grammar  is  unambiguous.  There  arc  also  languages  for 
which  there  exists  an  unambiguous  grammar  but  no  LL (k)  one.  For  example,  consider 
{a''bV"d  :n,  m  >0}U  {a"bmc"'e  :n,m  ^  ()},  which  is  unambiguous,  but  not  LL(fc), 
for  anv  k, since  there  is  no  fixed  bound  on  the  number  of  lookahead  symbols  that  must 
be  examined  in  order  to  determine  whether  a  given  input  string  belongs  to  the  first  or 
the  second  sublanguage.  There  are  even  deterministic  context-free  languages  that  are 
not  LL(A),  for  any  k.  One  such  example  is  ( a''b'\  n  >  0}  U  { aV,  n  a  ()}.  (Intuitively, 
the  problem  there  is  that,  given  a  siring  w,  it  is  not  possible  to  determine  the  first  step 
in  the  derivation  of  w  until  either  a  b  or  a  c  is  read.)  But  many  practical  languages  are 
LL(fc).  In  fact,  many  are  LL(  1 ),  so  it  is  worth  looking  at  ways  to  exploit  this  property  in 
the  design  of  a  top-down  parser. 

There  are  two  reasonably  straightforward  ways  to  go  about  building  a  predictive 
parser  for  a  language  L  that  is  described  by  an  LL(  I )  grammar  G.  We  consider  each  of 
them  briefly  here. 


Recursive  Descent  Parsing 

A  recursive-descent  parser  contains  one  function  for  each  nonterminal  symhol  A  in  G. 
The  argument  of  each  such  function  is  a  parse  tree  node  labeled  A,  and  the  function's 
job  is  to  create  the  appropriate  parse  tree  beneath  the  node  that  it  is  given.  The  func¬ 
tion  corresponding  to  the  nonterminal  A  can  be  thought  of  as  a  case  statement,  with 
one  alternative  for  each  of  the  ways  that  A  can  be  expanded.  Each  such  alternative 
checks  whether  the  next  chunk  of  input  could  have  been  derived  from  A  using  the  rule 
in  question.  It  checks  each  terminal  symbol  directly.  To  check  each  nonterminal  sym¬ 
bol,  it  invokes  the  function  that  is  defined  for  it.  The  name  “recursive  descent''  comes 
from  the  fact  that  most  context-free  grammars  contain  recursive  rules,  so  the  parser 
will  typically  exploit  recursive  function  calls. 
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EXAMPLE  15.8  Recursive  Descent  Parsing 

Let  G  include  the  rules: 

A  -*  BA  |  a 
B  —  bfl  |  b 

The  function  associated  with  A  will  then  be  (ignoring  many  details,  including 
how  the  next  lookahead  symbol  is  computed,  how  the  parse  tree  is  actually  built, 
and  what  happens  on  input  strings  that  are  not  in  L(G)): 

A{n\  parse  tree  node  labeled  A)  = 

case  (lookahead  =  b :  /*  Use  the  rule  A  — *  BA. 

Invoke  B  on  a  new  daughter  node  labeled  B. 

Invoke  A  on  a  second  new  daughter  node 
labeled  A. 

lookahead  =  a :  /*  Use  the  rule  A  — >  a. 

Create  a  new  daughter  node  labeled  a. 


Table-Driven  LL(1)  Parsing 

Instead  of  letting  a  set  of  recursive  function  calls  implicitly  maintain  a  stack,  we  could 
build  a  parser  that  works  in  much  the  same  way  that  the  top-down  PDAs  of  Section 
12.3.1  do.  Such  a  parser  would  maintain  its  stack  explicitly. 

Consider  all  of  the  transitions  that  such  a  parser  can  take.  We  can  index  them  in  a 
table  called  a  parse  table,  which  contains  one  row  for  each  nonterminal  that  could  be 
on  the  top  of  the  stack  and  one  column  for  each  terminal  symbol  that  could  correspond 
to  the  lookahead  symbol.  Then  we  can  build  a  straightforward  table-driven  parser  that 
chooses  its  next  move  by  using  the  current  top-of-stack  and  lookahead  symbols  as  in¬ 
dices  into  the  table. 


EXAMPLE  15.9  Building  a  Parse  Table 

Let  G  be: 

S  —  48$  |  Ad 
A  -*  a4  |  a 
B  -  bfi  |  b 
C  —  c 
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The  parse  table  for  G  would  be: 


"  Lookahead  symbol 

Top  of  stack 

a 

b 

c 

$ 

S 

S  —  AB\ 
S-*AC% 

A 

A  -*  2lA 

A  -*  a 

B 

B—*bB 
B->  b 

C 

c— *  c 

Notice  two  things  about  the  parse  table  that  we  just  built: 

■  Many  of  the  cells  are  empty.  If  the  parser  looks  in  the  table  and  finds  an  empty  cell, 
it  knows  that  it  has  hit  a  dead-end: The  path  it  is  currently  following  will  never  suc¬ 
ceed  in  parsing  the  input  string. 

•  Some  of  the  cells  contain  more  than  one  rule.  A  parser  that  used  that  table  as  the 
basis  for  choosing  its  next  move  would  thus  be  nondeterministic.  Suppose,  on  the 
other  hand,  we  could  guarantee  that  the  table  contained  at  most  one  rule  in  each 
cell.  Then  a  parser  that  was  driven  by  it  would  be  deterministic. 


Given  any  LL(1)  grammar  G,  it  is  possible  to  build  a  parse  table  with  at  most  one 
rule  in  each  cell.  Thus  it  is  possible  to  build  a  deterministic  (predictive)  table-driven 
parser  for  G.The  parser  simply  consults  the  table  at  each  step  and  applies  the  rule  that 
is  specified. 

Note  that  the  grammar  of  Example  15.9  is  not  1 .1.(1 )  because  a  is  an  element  of  both 
firsi{AB%)  and  first( ACS).  Thus  there  are  two  ways  to  expand  S  if  the  lookahead  sym¬ 
bol  is  a.  There  are  also  two  ways  to  expand  A  if  the  lookahead  symbol  is  a  and  two 
ways  to  expand  B  if  the  lookahead  symbol  is  b.  But  the  language  described  by  that 
grammar  is  LL(1 ).  We  leave  the  construction  of  an  LL(  1 )  grammar  for  it  as  an  exercise. 

LL(1)  parsers  can  be  built  by  hand,  but  there  exist  tools  Q  that  greatly  simplify  the 
process. 


15.3  Bottom-Up  Parsing 

Rather  than  parsing  top-down,  as  we  have  just  described,  an  alternative  is  to  parse 
bottom-up,  and  thus  drive  the  process  directly  by  the  current  string  of  input  symbols. 

A  bottom-up  parser  for  a  language  defined  by  a  grammar  G  works  by  creating  the 
bottom  nodes  of  a  parse  tree  and  labeling  them  with  the  terminal  symbols  in  the 
input.  Then  it  attempts  to  build  a  complete  parse  tree  above  those  nodes.  It  does  this 
by  applying  the  rules  in  Rr,  backwards.  In  other  words,  suppose  that  a  sequence  of 
nodes  labeled has  already  been  built  and  R(1  contains  the  rule: 


*». 


X~*X\,X2 
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Then  the  parser  can  build  a  node,  label  it  X,  and  insert  the  nodes  labeled  xx,  x2, . . .  x„ 
as  its  children.  If  the  parser  succeeds  in  building  a  tree  that  spans  the  entire  input  and 
whose  root  is  labeled  SG,  then  it  has  succeeded.  If  there  is  no  path  by  which  it  can  do 
that,  it  fails  and  reports  that  the  input  string  is  not  in  L(G). 

Since  there  may  be  choices  for  which  rule  to  apply  at  any  point  in  this  process,  a 
bottom-up  parser  patterned  after  the  PDAs  that  we  built  in  Section  123.1  may  be  non- 
deterministic  and  its  running  time  may  grow  exponentially  in  the  length  of  the  input 
string.  That  is  clearly  unacceptable  for  a  practical  parser.  In  the  next  section  we  de¬ 
scribe  a  straightforward  bottom-up  parsing  algorithm  with  running  time  that  is  0(n3). 
But  even  that  may  not  be  good  enough.  Fortunately,  just  as  we  did  for  top-down  pars¬ 
ing,  we  can  construct  a  deterministic  parser  that  runs  in  time  that  is  O(n)  if  we  impose 
some  restrictions  on  the  grammars  that  we  use. 


15.3.1  The  Cocke-Kasami-Younger  Algorithm 

A  straightforward,  bottom-up  parser  that  handles  nondeterminism  by  backtracking 
typically  wastes  a  lot  of  time  building  and  rebuilding  nodes  as  it  backtracks.  An  alterna¬ 
tive  is  a  dynamic  programming  approach  in  which  each  possible  constituent  is  built 
(bottom-up)  exactly  once  and  then  made  available  to  any  later  rules  that  want  to  use  it. 

The  Cocke-Kasami-Younger  algorithm  (also  called  CKY  or  CYK)  works  by  storing 
such  constituents  in  a  two  dimensional  table  T  that  contains  one  column  for  each  input 
symbol  and  one  row  for  each  possible  substring  length.  Call  the  input  string  w  and  let 
its  length  be  n.  Then  T  contains  one  row  and  one  column  for  each  integer  between  1 
and  n.  We  will  number  the  rows  of  T  starting  at  the  bottom  and  the  columns  starting 
from  the  left.  For  all  i  and  j  between  1  and  n,  each  cell  T[i,j ]  corresponds  to  the  sub¬ 
string  of  to  that  extends  for  /  symbols  and  starts  in  position  j.  The  value  that  will  even¬ 
tually  fill  each  such  cell  will  be  the  set  of  nonterminal  symbols  that  could  derive  the 
string  to  which  the  cell  corresponds.  For  example,  to  parse  the  string  id  +  id*id,  we 
would  need  to  fill  in  the  cells  in  Table  15.2.  Note  that  each  cell  is  labeled  with  the  sub¬ 
string  to  which  it  corresponds,  not  with  the  value  it  will  eventually  take  on. 

Let  G  be  the  grammar  that  is  to  be  used.  Initially,  each  cell  in  Twill  be  blank. The 
parser  will  begin  filling  in  T,  starting  from  the  bottom,  and  then  moving  upward  to 
row  n.  When  it  is  complete,  each  cell  in  the  lower  triangle  of  T  will  contain  the  set  of 
nonterminal  symbols  that  could  have  generated  the  corresponding  substring.  If  the 
start  symbol  of  G  occurs  in  T[n,  1],  then  G  can  generate  the  substring  that  starts  in 
position  1  and  has  length  n.  But  that  is  exactly  to.  So  G  can  generate  w. 


Table  15.2  The  table  that  a  CKY  parser  builds.  Each  cell  will  eventually  contain  the  set 
of  nonterminals  that  can  derive  the  constituent,  shown  here,  to  which  the  cell  corresponds. 


Row  5 

Row  4 

id  +  id  * 

+  id  *  id 

Row  3 

id  +  id 

+  id  * 

id  *  id 

Row  2 

id+ 

+  id 

id  * 

*  id 

Row  1 

id 

+ 

id 

* 

id 

Inpul  string: 

id 

+ 

id 

* 

id 

342  Chapter  15  Context-Free  Parsing 


The  CKY  algorithm  requires  that  the  grammar  that  it  uses  be  in  Chomsky  normal 
form.  Recall  that,  in  a  Chomsky  normal  form  grammar,  all  rules  have  one  of  the  follow¬ 
ing  two  forms: 

•  X-* *a.  where  a  e£,  or 

•  X-*  BC,  where  B  and  C are  elements  of  V  -  2. 

So  we  need  two  separate  techniques  for  filling  in  T: 

•  To  fill  in  row  1.  use  rules  of  the  form  X—*  a.  In  particular,  if  X  *  a  and  a  is  the  sym¬ 
bol  associated  with  column  /.  then  add  X  to  7(1, /|. 

•  To  fill  in  rows  2  through  n.  use  rules  of  the  form  X  -*  BC.  since  they  are  the  ones 
that  can  combine  constituents  to  form  larger  ones.  Suppose  the  parser  is  working 
on  some  cell  in  row  k.  It  wants  to  determine  whether  the  rule  X -*  BC  can  be  used 
to  generate  the  corresponding  substring  s  of  length  k.  If  it  can.  then  there  must  be 
some  way  to  divide  s  into  exactly  two  constituents,  one  corresponding  to  B  and  the 
other  corresponding  to  C.  Since  both  of  those  constituents  must  be  shorter  than  s, 
any  ways  there  are  of  building  them  must  already  be  represented  in  cells  in  rows 
below  k. 

We  can  now  state  the  CKY  algorithm  as  follows: 

CKY(G:  Chomsky  normal  form  grammar.tr  =  string)  = 

/*  Fill  in  the  first  (bottom-most)  row  of  T.  checking  each  symbol  in  w>  and 
finding  all  the  nonterminals  that  could  have  generated  it. 

1.  For  /  =  1  to  n  do: 

If  G  contains  the  rule  X-*ar  then  add  X  to  7(1./]. 

/*  Fill  in  the  remaining  rows,  starting  with  row  2  and  going  upward. 

2.  For  t  =  2  to  n  do:  /*  For  each  row  after  the  first 

For ;'  =  1  to  n  -  /  +  1  do:  /*  For  each  column  in  the  lower  triangle  of  T 

For  k  =  1  to  i  -  1  do:  /*  For  each  character  after  which  there 

could  be  a  split  into  two  constituents 

For  each  rule  X—*YZ  do: 

####  If  Y  e  T[k . yj  and  Z  e  T[i  -  k .  /'  +  k\.  then:  f*  Y  and  Z  found. 

Insert  X  into  T\i.j\. 

3.  If  SGe  T[n ,  1]  then  accept  else  reject. 

The  core  matching  operation  occurs  in  the  step  flagged  with  ####.The  parser  must 
determine  whether  X  could  have  generated  the  substring  that  starts  in  position  j  and 
has  length  /.  It  is  currently  considering  splitting  that  substring  after  the  Alh  symbol.  So  it 
checks  whether  Y  could  have  generated  the  first  piece,  namely  the  one  that  starts  in  po¬ 
sition  j  and  has  length  k.  And  it  checks  whether  Z  could  have  generated  the  second 
piece,  namely  the  one  that  starts  in  position  j  +  k  and  whose  length  is  equal  to  the 
length  of  the  original  substring  minus  the  part  that  Y  matched.  That  is  i  -  k. 
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EXAMPLE  15.10  The  CKY  Algorithm 

Consider  parsing  the  string  aab  with  the  grammar: 

S  -*  A  B 
A  —*  AA 
A  -*  a 
B  -»  a 
B  —  b 

CKY  begins  by  filling  in  the  bottom  row  of  T  as  follows: 

Row  3 
Row  2 
Row  1 
Input  string: 

Notice  that,  at  this  point,  the  algorithm  has  no  way  of  knowing  whether  the  a’s 
were  generated  by  A  or  by  B. 

Next,  the  algorithm  moves  on  to  step  2.  Setting  i  to  2,  it  fills  in  the  second  row, 
corresponding  to  substrings  of  length  2,  as  follows:  When  i  is  2  and  j  is  1,  it  is  con¬ 
sidering  ways  of  generating  the  initial  substring  aa.  Setting  k  to  1,  it  considers 
splitting  it  into  a  and  a. Then,  considering  the  rule  S  — ►  AB,  it  finds  the  A  and  the 
B  that  it  needs  in  row  1,  so  it  adds  S  to  T[2, 1].  Similarly  for  the  rule  A  -*  AA,  so 
it  adds  A  to  T[ 2, 1].  It  then  sets  j  to  2  and  looks  at  ways  of  generating  substrings  that 
start  in  position  2.  Setting  k  to  1,  it  considers  splitting  ab  into  a  and  b.  Considering 
the  rule  S  -*  A  B,  it  finds  the  A  and  the  B  that  it  needs,  so  it  adds  5  to  T[ 2, 2].  At  this 
point,  T  is: 

Row  3 
Row  2 
Row  1 
Input  string: 

Next  CKY  sets  i  to  3.  So  it  is  considering  strings  of  length  3.  There  is  only  one, 
namely  the  one  that  starts  at  position  1.  So  the  only  value  of  j  that  will  be  consid¬ 
ered  is  1.  There  are  now  two  values  of  k  to  consider,  since  there  are  two  ways  that 
the  string  aba  can  be  split  in  two.  Setting  Ac  to  1,  it  is  considering  the  constituents  a 
and  ab.  Considering  the  rule  S-*AB,  it  looks  for  an  A  of  length  1  starting  in  po¬ 
sition  1  (which  it  finds)  and  a  B  of  length  2  starting  in  position  2  (which  it  fails  to 
find).  It  then  considers  the  other  rule,  A—*AA.  For  this  rule  to  succeed  there 
would  have  to  be  an  A  of  length  1  in  position  1  (which  it  finds)  and  a  second  A  of 
length  2  starting  in  position  2  (which  it  fails  to  find).  Notice  that,  since  it  needs  an 
A  of  length  2,  it  must  look  in  row  2.  The  A  in  row  1  doesn’t  help.  So  it  has  found 


S,A 

S 

A,B 

A,B 

B 

a  a  b 


A,B 

A,B 

B 

a  a  b 
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EXAMPLE  15.10  ( Continued) 

nothing  by  breaking  the  string  after  position  1.  It  sets  k  to  2  and  considers  break¬ 
ing  it.  after  position  2,  into  aa  and  b.  Now  it  again  tries  the  first  rule. S—*AB.  It 
looks  in  row  2  for  an  A  that  generated  aa  and  in  row  1  for  a  &  that  generated  b.  It 
finds  both  and  inserts  5  into  T[3, 1  ].  So  it  accepts. 

The  algorithm  that  we  just  presented  does  not  actually  build  a  parse  tree.  It  simply 
decides  whether  the  string  w  is  in  L(G).  It  can  easily  be  modified  to  build  parse  trees  as 
it  applies  rules.  Then  the  final  parse  tree  for  w  is  the  one  associated  with  the  start  sym¬ 
bol  in  T[n,  1  j.  If  G  is  ambiguous,  there  may  be  more  than  one  such  tree. 

We  can  analyze  the  complexity  of  CKY  as  follows:  We  will  assume  that  the  size  of 
the  grammar  G  is  a  constant,  so  any  operation  whose  complexity  is  dependent  only  on 
the  size  of  G  takes  constant  time.  This  means  that  the  code  inside  the  loop  of  step  1  and 
the  testing  of  all  grammar  rules  that  is  done  inside  the  loop  of  step  2  each  take  constant 
time.  Step  1  takes  time  that  is  O(n).  Step  2  can  be  analyzed  as  follows: 

•  The  outer  loop  (/)  is  executed  n  —  1  limes. 

•  The  next  loop  (/')  is  executed,  on  average  nil  times  and  at  most  n  -  1  times. 

•  The  next  loop  ( k )  is  also  executed,  on  average  nil  limes  and  at  most  n  -  1  times. 

•  The  inner  loop  takes  constant  time. 


So  step  2  takes  time  that  is  G((n  -  l)(;j/2)(;»/2))  =  O(n').  Step  3  takes  constant 
time.  So  the  total  time  is  C2(/r  )• 

If  we  also  want  to  consider  the  size  of  G.then  let  |G|  be  the  number  of  rules  in  G.If 
G  is  in  Chomsky  Normal  form,  CKY  lakes  lime  that  is  |G|).  But  if  G  is  not  al¬ 
ready  in  Chomsky  Normal  form,  it  must  first  be  converted,  and  that  process  can  take 
time  that  is  0(  •  |G|2).  So  we  have  that  the  total  time  required  by  CKY  is  0(n3  •  |G|2). 


15.3.2  Context-Free  Parsing  and  Matrix  Multiplication 

The  CKY  algorithm  can  be  described  in  terms  of  Boolean  matrix  multiplication.  Stated 
that  way.  its  time  efficiency  depends  on  the  efficiency  of  Boolean  matrix  multiplication. 
In  particular,  again  assuming  that  the  size  of  the  grammar  is  constant,  the  running  time 
becumes  0[M  («)),  where  M(n)  is  the  time  required  to  multiply  two  n  x  n  Boolean 
matrices.  Straightforward  matrix  multiplication  algorithms  (such  as  Gaussian  elimina¬ 
tion)  take  time  that  is  0{n),  so  this  recasting  of  the  algorithm  has  no  effect  on  its  com¬ 
plexity.  But  faster  matrix  multiplication  algorithms  exist.  For  example,  Strassen's 
algorithm  (described  in  Exercise  27.9)  reduces  the  time  to  0(tr  M  ).  but  at  a  price  of 
increased  complexity  and  a  structure  that  makes  it  less  efficient  for  small  to  medium 
values  of  n.The  fastest  known  technique,  the  Coppersmith-Winograd  algorithm  9,  has 
worst  case  running  lime  that  is  0(/j2J7h),  but  it  is  too  complex  to  be  practical. 

More  recently,  a  further  result  9  that  links  matrix  multiplication  and  context-free 
parsing  has  been  shown:  Let  P  be  any  context-free  parser  with  time  complexity 
0(gn*'K),  where  g  is  the  size  of  the  grammar  and  n  is  the  length  of  the  input  string. Then 
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P  can  be  efficiently  converted  into  an  algorithm  to  multiply  two  n  X  n  Boolean  matri¬ 
ces  in  time  O  (/rVe/3).  So,  if  there  were  a  fast  algorithm  for  parsing  arbitrary  context- 
free  languages,  there  would  also  be  a  fast  matrix  multiplication  algorithm.  Substantial 
effort  over  the  years  has  been  expended  looking  for  a  fast  matrix  multiplication  algo¬ 
rithm,  and  none  has  been  found.  So  it  appears  relatively  unlikely  that  there  is  a  fast  gen¬ 
eral  algorithm  for  context-free  parsing. 

15.3.3  Shift-Reduce  Parsing 

The  CKY  algorithm  works  for  any  context-free  language,  but  it  has  two  important 
limitations: 

•  It  is  not  efficient  enough  for  many  applications.  We’d  like  a  deterministic  parser 
that  runs  in  time  that  is  linear  in  the  length  of  the  input. 

•  It  requires  that  the  grammar  it  uses  be  stated  in  Chomsky  normal  form.  We’d  like  to 
be  able  to  use  more  natural  grammars  and  thus  to  extract  more  natural  parse  trees. 

We’ll  next  consider  a  bottom-up  technique  that  can  be  made  deterministic  for  a 
large,  practically  significant  set  of  languages. 

The  parser  that  we  are  about  to  describe  is  called  a  shift- reduce  parser.  It  will  read 
its  input  string  from  left  to  right  and  can  perform  two  basic  operations: 

1.  Shift  an  input  symbol  onto  the  parser’s  stack  and  build,  in  the  parse  tree,  a  termi¬ 
nal  node  labeled  with  that  input  symbol. 

2.  Reduce  a  string  of  symbols  from  the  top  of  the  stack  to  a  nonterminal  symbol, 
using  one  of  the  rules  of  the  grammar.  Each  time  it  does  this,  it  also  builds  the 
corresponding  piece  of  the  parse  tree. 

We’ll  begin  by  considering  a  shift-reduce  parser  that  may  have  to  explore  more  than 
one  path.Then  we’ll  look  at  ways  to  make  it  deterministic  in  cases  where  that  is  possible. 

To  see  how  a  shift-reduce  parser  might  work,  let’s  trace  its  operation  on  the  string 
id  +  id*id,  using  our  original  term/factor  grammar  for  arithmetic  expressions: 

1.  £-*-£  +  T 

2.  E-*T 

3.  T-+T*  F 

4.  T-+F 

5.  F-*(£) 

6.  F-*  id 

We’ll  number  the  main  steps  in  this  process  so  that  we  can  refer  to  them  later. 

Step  1:  When  we  start,  the  parser’s  stack  is  empty,  so  our  only  choice  is  to  shift  the  first 
input  symbol,  id,  onto  the  stack.  Next,  we  have  a  choice.  We  can  either  use  rule  6  to 
reduce  id  to  F,  or  we  can  get  the  next  input  symbol  and  shift  it  onto  the  stack.  It’s 
clear  that  we  need  to  apply  rule  6  now.  Why?  Because  there  are  no  other  rules  that  can 
consume  an  i  d  directly.  So  we  have  to  do  this  reduction  before  we  can  do  anything  else 
with  i  d.  But  could  we  wait  and  do  it  later?  No,  because  reduction  always  applies  to  the 
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symbols  at  the  top  of  the  stack.  If  we  push  anything  on  before  we  reduce  id.  well 
never  again  get  id  at  the  lop  of  the  stack.  It  will  just  sit  there,  unable  to  participate  in 
any  rules.  So  we  reduce  id  to  F.  giving  us  a  stack  containing  just  F.  and  the  parse  tree 
(remember  we're  building  it  up  from  the  bottom): 

F 

id  1  F  1 

The  reasoning  we  just  did  is  going  to  be  the  basis  for  the  design  of  a  “smart  deter¬ 
ministic  bottom  up  parser.  Without  that  reasoning,  a  dumb,  brute  force  parser  would 
have  had  to  consider  both  paths  at  this  first  choice  point:  the  one  we  took,  as  well  as  the 
one  that  fails  to  reduce  and  instead  pushes  +  onto  the  stack.  Ilial  second  path  will 
eventually  reach  a  dead  end.  so  even  a  brute  force  parser  will  eventually  gel  the  right 
answer.  But  our  goal  is  to  eliminate  search. 

Step  2:  At  this  point,  the  parser’s  stack  contains  Fand  the  remaining  input  is  +  id  *  id. 
Again  we  must  choose  between  reducing  the  top  of  the  stack  or  shifting  on  the  next  input 
symbol.  Again,  by  looking  ahead  and  analyzing  the  grammar,  we  can  see  that  eventually 
we  will  need  to  apply  rule  l.To  do  so.  the  First  id  will  have  to  have  been  reduced  to  a  T 
and  then  to  an  £.  So  let’s  next  reduce  by  rule  4  and  then  again  by  rule  2.  giving  the  parse 
tree  and  stack: 


E 

I 

T 


F 


Step  3:  At  this  point,  there  are  no  further  reductions  to  consider,  since  there  are  no 
rules  whose  right-hand  side  is  just  £.  So  we  must  consume  the  next  input 
symbol  +  and  shift  it  onto  the  stack.  Having  done  that,  there  are  again  no  available 
reductions.  So  we  shift  the  next  input  symbol. The  stack  then  contains  id  +  E  (writing 
the  stack  with  its  top  to  the  left).  Again,  we  need  to  reduce  id  before  we  can  do  any¬ 
thing  else, so  we  reduce  it  to  Fand  then  to  T.  Now  we’ve  got: 


£ 

I 

T 

I 

F 

I 

id 


T 

I 

F 

I 

id 
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Notice  that  we  have  three  parse  tree  fragments.  Since  we're  working  up  from  the 
bottom,  we  don’t  know  yet  how  they'll  get  put  together.  We  now  have  three  choices: 
Reduce  T  to  £  using  rule  2,  reduce  T  +  E  to  £  using  rule  1  or  shift  on  the  next  input 
symbol.  Note,  by  the  way,  that  we  will  always  be  matching  the  right-hand  sides  of  the 
rules  in  reverse  because  the  last  symbol  we  read  (and  thus  the  right-most  one  we’ll 
match)  is  at  the  top  of  the  stack. 

Step  4:  In  considering  whether  to  reduce  or  shift  at  this  point,  we  realize  that,  for  the 
first  time,  there  isn’t  one  correct  answer  for  all  input  strings.  When  there  was  just  one 
universally  correct  answer,  we  could  compute  it  simply  by  examining  the  grammar. 
Now  we  can’t  do  that.  In  the  example  we’re  working  with,  we  don’t  want  to  do  either  of 
the  reductions,  since  the  next  input  symbol  is  *.  We  know  that  the  only  complete  parse 
tree  for  this  input  string  will  correspond  to  the  interpretation  in  which  *  is  applied  be¬ 
fore  +.  That  means  that  +  must  be  at  the  top  of  the  tree.  If  we  reduce  now,  it  will  be  at 
the  bottom.  So  we  need  to  shift  +  onto  the  stack  and  do  a  reduction  that  will  build  the 
multiplication  piece  of  the  parse  tree  before  we  do  a  reduction  involving  +.  But  if  the 
input  string  had  been  id  +  id  +  id,  we’d  want  to  reduce  now  in  order  to  cause  the  first 
+  to  be  done  first,  thus  producing  left  associativity.  So  we  appear  to  have  reached  a 
point  where  we’ll  have  to  branch.  If  we  choose  the  wrong  path,  we’ll  eventually  hit  a 
dead  end  and  have  to  back  up.  We’d  like  not  to  waste  time  exploring  dead  end  paths, 
however.  We’ll  come  back  later  to  the  question  of  how  we  can  make  a  parser  know  how 
to  avoid  dead  ends.  For  now,  let’s  just  forge  ahead  and  do  the  right  thing  and  see  what 
happens. 

As  we  said,  what  we  want  to  do  here  is  not  to  reduce  but  instead  to  shift  *  onto  the 
stack.  Once  we  do  that,  the  stack  will  contain  *T  +  E.  At  this  point,  there  are  no 
available  reductions  (since  there  are  no  rules  whose  right-hand  side  contains  *  as  the 
last  symbol), so  we  shift  the  next  symbol,  resulting  in  the  stack  id  *  T  +  E.  Clearly 
we  next  have  to  reduce  id  to  F (following  the  same  argument  that  we  used  above), so 
we’ve  got: 
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I 
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I 

F 

I 

id 


+ 


T 

I 

F 


id 


F 

I 

id 


Step  5:  Next,  we  need  to  reduce  (since  there  aren't  any  more  input  symbols  to  shift), 
but  now  we  have  another  decision  to  make:  Should  we  reduce  the  top  F  to  T .  using 
rule  4.  or  should  we  reduce  the  top  three  symbols  to  7\  using  rule  3?  The  right  answer 
is  to  use  rule  3,  producing: 
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Step  6:  Finally,  we  need  to  apply  rule  1,  to  produce  the  single  symbol  E  on  the  top  of  the 
slack,  and  the  parse  tree: 


The  job  of  a  shift-reduce  parser  is  complete  once  it  has: 


•  built  a  parse  tree  that  spans  the  entire  input  string,  and 

•  produced  a  stack  that  contains  just  the  start  symbol. 

So  we  are  done,  although  we'll  discuss  below  extending  the  input  to  include  an  end- 
of-input  symbol  $.  In  that  case,  we  will  usually  have  a  final  rule  that  consumes  the  $  and 
pops  the  start  symbol  from  the  stack. 

Now  let's  return  to  the  question  of  how  we  can  build  a  parser  that  makes  the  right 
choices  at  each  step  of  the  parsing  process.  As  we  walked  through  the  example  parse 
above,  there  were  two  kinds  of  decisions  that  we  had  to  make: 


•  Whether  to  shift  or  reduce  (we’ll  call  these  shift-reduce  conflicts). 

•  Which  of  several  available  reductions  to  perform  (we’ll  call  these  reduce-reduce 
conflicts). 

Let’s  focus  first  on  shift-reduce  conflicts.  At  least  in  this  example,  it  was  always  pos¬ 
sible  to  make  the  right  decision  on  these  conflicts  if  we  had  two  kinds  of  information: 

•  The  symbol  that  is  currently  on  the  top  of  the  stack,  coupled  with  a  good  under¬ 
standing  of  what  is  going  on  in  the  grammar.  For  example,  wc  noted  that  there’s 
nothing  to  be  done  with  a  raw  i  d  that  hasn't  been  reduced  to  an  F. 

•  A  peek  at  the  next  input  symbol  (the  one  that  we’re  considering  shifting),  which  we 
call  the  lookahead  symbol.  For  example,  when  we  were  trying  to  decide  whether  to  re¬ 
duce  T  +  £  or  shift  on  the  next  symbol,  we  looked  ahead  and  saw  that  the  next  sym¬ 
bol  was  *.  Since  we  know  that  *  has  higher  precedence  than  +,  we  knew  not  to  reduce 
+,  but  rather  to  wait  and  deal  with  *  first.  In  order  to  guarantee  that  there  always  is  a 
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lookahead  symbol,  even  after  the  last  real  input  symbol  has  been  read,  we’ll  assume 
from  now  on  that  every  string  ends  with  $,  a  special  end-of-input  symbol. 

If  the  decision  about  whether  to  shift  or  to  reduce  is  dependent  only  on  the  current  top 
of  stack  symbol  and  the  current  lookahead  symbol,  then  we  can  define  a  procedure  for 
resolving  shift-reduce  conflicts  by  specifying  a  precedence  relation  PQV  x  {X  U  $}. 
P  will  contain  the  pair  (s,  c)  iff,  whenever  the  top  of  stack  symbol  is  s  and  the  lookahead 
symbol  is  c,  the  parser  should  reduce.  If  the  current  situation  is  described  by  a  pair  that  is 
not  in  P ,  then  the  parser  will  shift  the  lookahead  symbol  onto  the  stack. 

An  easy  way  to  encode  a  precedence  relation  is  as  a  table,  which  we’ll  call  a 
precedence  table.  As  an  example,  consider  the  following  precedence  table  for  our  arith¬ 
metic  expression  grammar,  augmented  with  $: 
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This  table  should  be  read  as  follows:  Compare  the  left-most  column  to  the  top  of  the 
stack  and  find  the  row  that  matches.  Now  compare  the  symbols  along  the  top  of  the 
chart  to  the  lookahead  symbol  and  find  the  column  that  matches.  If  there’s  an  R  in  the 
corresponding  square  of  the  table,  then  reduce.  Otherwise,  shift. 

Let’s  now  go  back  to  the  problem  of  parsing  our  example  input  string,  id  +  id  *  id. 
Remember  that  we  had  a  shift/reduce  conflict  at  step  4,  when  the  stack's  contents  were 
T  +  E  and  the  next  input  symbol  was  *.  We  can  now  resolve  that  conflict  by  checking 
the  precedence  table.  We  look  at  the  next  to  the  last  row  of  the  table,  the  one  that  has 
T  as  the  top  of  stack  symbol.Then  we  look  at  the  column  headed  *.  There’s  no  R,  so  we 
don’t  reduce.  But  notice  that  if  the  lookahead  symbol  had  been  +,  we'd  have  found  an 
R.  telling  us  to  reduce,  which  is  exactly  what  we'd  want  to  do.  Thus  this  table  captures 
the  precedence  relationships  between  the  operators  *  and  +,  plus  the  fact  that  we  want 
to  associate  left  when  faced  with  operators  of  equal  precedence. 

Now  consider  the  problem  of  resolving  reduce-reduce  conflicts.  Here’s  a  simple 
strategy  called  the  longest-prefix  heuristic.  Given  a  choice  of  right-hand  sides  that 
match  the  current  slack,  choose  the  longest  one. 

Returning  to  our  example  parse,  we  encountered  a  reduce-reduce  conflict  at  step  5. 
The  longest-prefix  heuristic  tells  us  to  reduce  F*  T  rather  than  just  F,  which  is  the  right 
thing  to  do. 

15.3.4  Deterministic,  Bottom-UP  LR  Parsing 

There  is  a  large  and  very  useful  class  of  languages  for  which  it  is  possible  to  build  a  de¬ 
terministic,  bottom-up  parser  by  extending  the  notion  of  a  precedence  table  so  that  it 
includes  even  more  information  about  paths  that  will  eventually  succeed  versus  those 
that  will  eventually  fail.  We’ll  call  the.  resulting  - *•- 
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We  define  a  grammar  G  to  be  LR(A),  for  any  positive  integer  A.  iff  it  is  possible  to 
build  a  deterministic  parser  for  G  that  scans  its  input  left  to  right  (thus  the  L  in  the 
name)  and,  for  any  input  string  in  L(G).  builds  a  rightmost  derivation  (thus  the  R  in 
the  name),  looking  ahead  at  most  A  symbols.  We  define  a  language  to  be  LR(A)  iff 
there  exists  an  LR(A)  grammar  for  it.  We’ll  state  here,  without  prm>f,  two  important 
facts  about  the  LR(A)  languages: 

•  The  class  of  LR(A)  languages  is  exactly  the  class  of  deterministic  context-free  lan¬ 
guages,  as  defined  in  Section  13.5. 

•  If  a  language  is  LR(A).  for  some  A.  then  it  is  also  LR(  1 ). 

Given  an  LR(  I )  grammar,  it  is  possible  to  build  a  parse  table  that  can  serve  as  the 
basis  for  a  deterministic  shift-reduce  parser.  The  parse  table,  like  the  precedence  table 
we  built  in  the  last  section,  tells  the  parser  when  to  shift  and  when  to  reduce.  It  also  tells 
it  how  to  resolve  reduce-reduce  conflicts.  Unfortunately,  for  many  LR(1)  languages, 
the  parse  table  is  too  large  to  bo  practical. 

But  there  is  a  technique,  called  LALR  (lookahead  LR)  parsing,  that  works  on  a 
restricted  class  of  LR(I)  grammars.  LALR  parsers  are  deterministic,  shift-reduce 
parsers.  They  are  widely  used  for  a  combination  of  three  important  reasons: 

•  Most  practical  languages  can  be  described  by  an  LALR  grammar. 

•  The  parse  tables  that  are  required  by  an  LALR  parser  are  reasonably  small. 

•  There  exist  powerful  tools  P  to  build  those  tables.  So  efficient  parsers  are  very  easy 
to  build. 

This  last  point  is  key.  While  it  is  possible  to  build  parse  tables  for  top-down  LL 
parsers  by  hand,  it  isn’t  possible,  for  any  but  the  simplest  grammars,  to  build  LALR 
parse  tables  by  hand.  As  a  result,  bottom-up  parsing  was  not  widely  used  until  the  devel¬ 
opment  of  parser-generation  tools.  The  most  influential  such  tool  has  been  Yacc  S, 
which  is  designed  to  work  together  with  Lex  (described  briefly  in  Section  15.1 )  to  build 
a  combined  lexical  analyzer/parser.  There  have  been  many  implementations  of  Yacc  and 
it  has  many  descendants. 


15.4  Parsing  Natural  Languages 

Programming  languages  are  artificial.  They  are  designed  by  human  designers,  who  are 
free  to  change  them  so  that  they  possess  various  desirable  properties,  including 
payability.  But  now  consider  English  or  Spanish  or  Chinese  or  Swahili.  These  lan¬ 
guages  are  natural.  They  have  evolved  to  serve  a  purpose,  but  that  purpose  is  commu¬ 
nication  among  people.  The  need  to  build  programs  to  analyze  them,  index  them, 
retrieve  them,  translate  them,  and  so  forth,  has  been  added  very  late  in  the  game.  It 
should  therefore  come  as  little  surprise  that  the  efficient  parsing  techniques  that  we 
have  described  in  the  last  two  sections  do  not  work  as  well  for  natural  languages  as 
they  do  for  artificial  ones. 
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15.4.1  The  Problems 

Parsers  for  natural  languages  must  face  at  least  five  problems  that  are  substantially 
more  severe  in  the  case  of  natural  languages  than  they  are  for  artificial  ones; 

•  Ambiguity:  There  do  not  exist  unambiguous  grammars  with  the  power  to  generate 
all  the  parse  trees  that  correspond  to  the  meanings  of  sentences  in  the  language. 
Many  sentences  are  syntactically  ambiguous.  (Recall  Example  1 1 .22.)  Choosing  the 
correct  parse  tree  for  a  sentence  generally  requires  ap|>eal  to  facts  about  the  larger 
context  in  which  the  sentence  occurs  and  facts  about  what  makes  sense.  Those  facts 
can  be  encoded  in  separate  functions  that  choose  from  among  a  set  of  parse  trees  or 
partial  parse  trees.  Or  they  may  be  encoded  probabilistically  in  a  stochastic  gram¬ 
mar,  as  described  in  Section  11.10.  Even  when  the  information  required  to  make  a 
choice  is  available  in  the  input  string,  it  may  be  many  words  away,  so  the  single  sym¬ 
bol  lookahead  that  we  used  in  LL  and  LR  parsing  is  rarely  adequate. 

•  Gaps;  In  the  sentence.  What  did  Den  eat?,  the  word  What  is  the  object  of  the  verb 
eat  but  it  is  not  near  it  in  the  sentence.  See  L3.3  for  a  discussion  of  this  issue. 

•  Dialect:  English  is  not  one  language.  It  is  hundreds  at  least.  Chinese  is  worse. There 
is  no  ISO  standard  for  English  or  Chinese.  So  what  language  should  we  build  a 
grammar  for? 

•  Evolution:  Natural  languages  change  as  they  are  used.  The  sentences.  You  wanted 
to  do  that  why?  and  They’ re  open  24/7,  are  fine  American  English  sentences 
today.  But  they  wouldn't  have  been  twenty  years  ago. 

•  Errors:  Even  among  speakers  who  agree  completely  on  how  they  ought  to  talk, 
what  they  actually  say  is  a  different  story.  While  it  is  acceptable  (and  even  desir¬ 
able)  for  a  compiler  to  throw  out  syntactically  ill-formed  programs,  imagine  the 
usefulness  of  a  translating  telephone  that  objected  to  every  sentence  that  stopped 
in  the  middle,  started  over,  and  got  a  pronoun  wrong.  Parsers  for  natural  languages 
must  be  robust  in  a  way  that  parsers  for  artificial  languages  are  not  required  to  be. 

In  addition,  natural  languages  share  with  many  artificial  languages  the  problem  of 
checking  for  agreement  between  various  constituents: 

•  For  programming  languages,  it  is  necessary  to  check  variable  declarations  against  uses. 

•  For  natural  languages,  it  is  necessary  to  check  for  agreement  between  subject  and  verb, 
for  agreement  between  nouns  and  modifiers  (in  languages  like  Spanish),  and  so  forth. 

In  G.2,  we  prove  that  one  typical  programming  language,  Java,  is  not  context-free 
because  of  the  requirement  that  variables  be  declared  before  they  are  used.  So  parsers 
for  programming  languages  exploit  additional  mechanisms,  such  as  symbol  tables,  to 
check  such  features.  In  L.3.3,  we  address  the  question  of  whether  natural  languages  such 
as  English  are  formally  context-free.  There  are  no  proofs,  consistent  with  the  empirical 
facts  about  how  people  actually  talk,  that  English  is  not  context-free.  There  is,  on  the 
other  hand,  a  proof  that  one  grammatical  feature  of  one  natural  language.  Swiss  Ger¬ 
man,  is  not  context-free.  But,  even  for  English,  it  is  more  straightforward  to  describe 
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agreement  features,  just  as  we  do  for  Java,  with  additional  mechanisms  that  check  agree¬ 
ment  features. 

15.4.2  The  Earley  Algorithm 

Despite  the  problems  we  just  described,  context-free  parsers,  augmented  as  necessary, 
are  widely  used  in  natural  language  processing  systems.  Although  efficient  parsing  is 
important,  deterministic  parsing  is  generally  not  possible.  So  LL  and  LR  techniques 
won’t  work.  And  it  is  usually  not  acceptable  to  require  grammars  to  he  in  Chomsky 
normal  form  (because  parsers  based  on  such  grammars  will  not  generate  natural  parse 
trees).  So  the  CKY  algorithm  can't  be  used.  The  Earley  algorithm,  which  we  present 
next,  is  a  reasonably  efficient  (O  (/»•'))  algorithm  that  works  with  an  arbitrary  context- 
free  grammar.  Because  of  its  importance  in  natural  language  processing,  we’ll  describe 
this  algorithm  as  it  operates  on  English  sentences.  But  it  can  be  applied  to  any  context- 
free  language.  So.  for  example,  to  imagine  it  being  used  in  a  compiler,  substitute  the 
term  “token"  each  time  we  mention  a  “word"  here. 

The  Earley  algorithm,  like  the  CKY  algorithm,  is  a  dynamic  programming  tech¬ 
nique.  It  works  top-down  but.  unlike  the  simple  depth-first  search  algorithm  that  we 
discussed  in  Section  15.2.1,  it  builds  each  potential  constituent  only  once  (and  then 
may  reuse  smaller  constituents  as  it  explores  alternative  ways  to  build  larger  ones). The 
structure  that  is  used  to  record  the  constituents  as  they  are  found  can  be  thought  of  as 
a  simple  chart.  So  parsers  that  are  based  on  the  Earley  algorithm,  and  on  others  that 
are  related  to  it.  arc  often  called  chart  parsers. 

To  describe  the  way  that  the  Earley  algorithm  works,  we  introduce  the  dot  notation, 
which  we  will  use  to  indicate  the  progress  that  the  parser  has  made  so  far  in  matching 
the  right-hand  side  of  a  grammar  rule  against  the  input.  Let: 

A—*ampy  describe  an  attempt  to  apply  the  rule  A  —*ap y.  where  everything 
before  the  •  has  already  matched  against  the  input  and  the  parser  is 
still  trying  match  everything  after  the  •. 

A  — *  •  a(3y  describe  a  similar  attempt  except  that  nothing  has  yet  matched 
against  the  input. 

A  —*apy  •  describe  a  similar  attempt  except  that  the  entire  right-hand  side 
(and  thus  also  A)  has  matched  against  the  input. 

The  overall  progress  of  the  parsing  process  can  be  described  by  listing  each  rule  that 
is  currently  being  attempted  and  indicating,  for  each: 

•  Where  in  the  sentence  the  parser  is  trying  to  match  the  right  hand  side,  and 

•  How  much  progress  (as  indicated  by  the  position  or  the  dot)  has  been  made  in 

doing  that  matching. 

All  of  this  information  can  be  summarized  in  a  chart  with  n  +  1  rows,  where  n  is  the 
number  of  words  in  the  input  siring.  In  creating  the  chart,  we  won’t  assign  indices  to  the 
words  of  the  input  string.  Instead  we  ll  assign  the  indices  to  the  points  in  between  the 
words.  So.  for  example,  we  might  have: 

0  Jen  1  saw  2  Bill  3 


15.4  Parsing  Natural  Languages  353 


We’ll  let  row  i  of  the  chart  contain  every  instance  of  an  attempt  to  match  a  rule 
whose  •  is  in  position  i.The  easiest  way  to  envision  the  chart  is  to  imagine  that  it  also 
has  n  +  1  columns,  which  will  correspond  to  the  location  in  the  input  at  which  the  par¬ 
tial  match  began.  We'll  reverse  our  usual  convention  and  list  the  column  index  first  so 
that  the  pair  describes  the  start  and  then  the  end  of  a  partial  match.  So  associating  the 
indices  [i,j]  with  a  rule  A  — *  a  •  / 3y  means  that  the  parser  began  matching  a  in  position 
/  and  the  •  is  currently  in  position  /. 

The  Earley  algorithm  works  top-down.  So  it  starts  by  inserting  into  the  chart  every 
rule  whose  left-hand  side  is  the  start  symbol  of  the  grammar. The  indices  associated  with 
each  such  rule  are  [0, 0],  since  the  parser  must  try  to  match  the  right-hand  side  starting 
before  the  first  word  (i.e..  in  position  0)  and  it  has  so  far  matched  nothing. The  job  of  the 
parser  is  to  find  a  match,  for  the  right-hand  side  of  at  least  one  of  those  rules,  that  spans 
the  entire  sentence.  In  other  words,  for  at  least  one  of  those  initial  rules,  the  •  must  move 
all  the  way  to  the  right  and  the  index  pair  [0,  n],  indicating  a  match  starting  before  the 
first  word  and  ending  after  the  last  one,  must  be  assigned  to  the  rule. 

To  see  how  the  algorithm  works,  we'll  trace  its  operation  on  the  simple  sentence  we 
showed  above,  given  the  following  grammar: 

S  ->  NP  VP 

NP  — *  ProperNoun 

VP  —  VNP 

After  initialization,  the  chart  will  be: 


3 

2 

1 

0 

S— *  •  NP  VP  j0. 0] 

0  Jen  1  saw  2  Bill  3 


Next,  the  algorithm  predicts  that  an  NP  must  occur,  starting  at  position  0.  So  it  looks 
for  rules  that  tell  it  how  to  construct  such  an  NP.  It  finds  one,  and  adds  it  to  the  chart, 
giving: 


mm 

i 

0 

NP  — *  •  ProperNoun  (0, 0] 

$-*  *NP  W>[0,0] 

0  Jen  1  saw  2  Bill  3 


Now  it  predicts  the  existence  of  a  ProperNoun  that  starts  in  position  0.  It  isn’t  gen¬ 
erally  practical  to  handle  part  of  speech  tags  like  ProperNoun  by  writing  a  rule  like 
ProperNoun —*  Jen  |  Bill  |  Chris  |...  instead,  we’ll  assume  that  the  input  has  al¬ 
ready  been  tagged  with  part  of  speech  markers  like  Noun .  ProperNoun.  Verb,  and  so 
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forth.  (See  L.2  for  a  discussion  of  how  this  is  process,  called  part  of  speech  or  POS  tag¬ 
ging  is  done.)  So,  whenever  the  next  symbol  the  parser  is  looking  for  is  a  part  of  speech 
tag,  it  will  simply  check  the  next  input  symbol  and  see  whether  it  has  the  required  tag. 
If  it  does,  a  match  has  occurred  and  the  parser  will  behave  as  though  it  just  matched  the 
implied  rule.  If  it  does  not,  then  no  match  has  been  found  and  the  rule  can  make  no 
progress.  In  this  case.  Jen  is  a  ProperNoun.  so  there  is  a  match.  The  parser  can  apply 
the  implied  rule  ProperNoun  — *  Jen.  Notice  that  whenever  the  parser  actually  matches 
against  the  input,  the  •  moves.  So  the  parser  adds  this  new  rule  to  the  next  row  of  the 
chart,  which  now  becomes: 
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1 

ProperNoun - Jen  •  (I).  1 ) 

0 

NP—*  •  ProperNoun  |0, 0] 

S-*  •  NP  VP  JO.  01 

ProperNoun  V.  N  ProperNoun 

0  Jen  1  saw  2  Bill  3 


The  parser  has  now  finished  considering  both  of  the  rules  in  row  0,  so  it  moves  on  to 
row  1.  It  notices  that  it  has  found  a  complete  ProperNoun.  Whenever  it  finds  a  com¬ 
plete  constituent,  it  must  look  back  to  see  what  rules  predicted  the  occurrence  of  that 
constituent  (and  thus  are  waiting  for  it).  The  new.  complete  constituent  starts  at  posi¬ 
tion  0,  so  the  parser  looks  for  rules  whose  •  is  in  position  (),  indicating  that  they  are 
waiting  for  a  constituent  that  starts  there.  So  it  looks  back  in  row  0.  ll  finds  that  the  NP 
rule  is  waiting  for  a  ProperNoun.  Since  a  ProperNoun  has  just  been  found,  the  parser 
can  create  the  rule  NP  — *  ProperNoun  •  [0, 1  ]  and  add  it  to  row  l.Thcn  it  looks  at  that 
rule  and  realizes  that  it  has  found  a  complete  NP  starting  in  position  0.  So  it  looks  back 
in  row  0  again,  this  time  to  see  what  rule  is  waiting  for  an  NP.  ll  finds  that  S  is.  so  it  cre¬ 
ates  the  rule  S-*  NPm  W^O,  1J.  At  this  point,  the  chart  looks  like  this  (using  ✓  to 
mark  rules  that  have  already  been  processed): 


3 

2 

1 

S - >  NP  •  VP  [0, 1 1 

✓  NP - >  ProperNoun  •  |l).  1 1 

✓  ProperNoun - >  Jen  •  [0, 1  j 

0 

✓  NP  — *  •  ProperNoun  [0.  Oj 

✓  S—  •NPVP[0.{)\ 

ProperNoun  V.  N  ProperNoun 

0  Jen  1  saw  2  Bill  3 


The  remaining  unprocessed  rule  tells  the  parser  that  it  needs  to  predict  again.  It 
needs  to  find  a  VP  starting  in  position  1.  Because  no  progress  has  been  made  in 
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finding  a  VP,  any  rule  that  could  describe  one  will  have  its  •  still  in  position  1.  So 
the  parser  adds  the  rule  VP  -*  •  V  NP  [1, 1]  to  row  1  of  the  chart.  At  this  point,  the 
chart  will  be: 


1 

VP-*  •  VNP[1,1] 

✓  S - >  NP  •  VP  [0, 1] 

✓  NP - >•  ProperNoun  •  [0, 1] 

✓  ProperNoun  >  Jen  *  [0, 11 

0 

/  NP—*  •  ProperNoun  [0,0] 

✓  S  —  *NP  VP  [0.0] 

ProperNoun  V,  N  ProperNoun 

0  Jen  1  saw  2  Bill  3 

In  processing  the  next  rule,  the  parser  notices  that  the  predicted  symbol  is  a  part  of 
speech,  so  it  checks  the  next  input  word  to  see  if  it  can  be  a  Verb.  Saw  has  been  tagged  as 
a  possible  Verb  or  a  possible  Noun.  So  a  new  rule  is  added,  this  time  to  row  2  since  the  • 
moves  to  the  right  to  indicate  that  the  match  has  moved  one  word  farther  in  the  input. 
Notice  that  because  the  Earley  algorithm  works  top-down,  it  will  ignore  part  of  speech 
tags  (such  as  saw  as  a  Noun)  that  don’t  fit  in  the  larger  sentence  context.The  chart  is  now: 


3 

2 

V - >  saw* fl,2l 

1 

✓ 

VP—  •  V  NP  [1, 1] 

✓ 

S - ^  NP  •  VP  [0, 1] 

✓ 

NP - >  ProperNoun  •  [0, 1] 

■  ■ 

/ 

ProperNoun - >  Den  •  [0,  lj 

Bfl 

/ 

NP—*  •  ProperNoun  [0,0] 

M 

✓ 

S—  •  NP  VP  [0, 0] 

ProperNoun  ProperNoun 

0  Jen  1  saw  2  Bill  3 


Having  found  a  complete  constituent  (the  V),  starting  in  position  1,  the  parser  looks 
back  to  row  1  to  find  rules  that  are  waiting  for  it.  It  finds  one:  VP-*  •  V  NP  (1, 1].  So 
it  can  advance  this  rule’s  •  and  create  the  rule  VP  —  V  •  NP  [1,2],  which  can  be  added 
to  row  2.That  rule  will  be  processed  next.  It  will  predict  the  existence  of  an  NP  starting 
in  position  2,  so  the  parser  will  create  rules  that  describe  the  possible  structures  for 
such  an  NP.  Our  simple  grammar  has  only  one  NP  rule,  so  the  parser  will  create  the 
rule  NP— *  •  ProperNoun  [2,2]  and  add  it  to  the  chart  in  row  2.  Next  the  parser  looks 
for, and  finds, a  ProperNoun,  Bill, starting  in  position  2  and  ending  at  position  3.  So  it 
enters  it  in  row  3.  At  this  point,  the  chart  will  be: 
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3 

ProperNoun - - *  Bill  •  (2. 3j 

2 

/  NP  —  •  ProperNoun  |2. 2] 

✓  VP - 3-  iVP|1.2| 

✓  V - •>  saw*  11.21 

I 

✓  VP—*VNP\l.\\ 

✓  5 - >  NP  •  VP  |0. 1 1 

/  NP - >  ProperNoun  •  |0. 1 1 

/  ProperNoun  >  Den  •  |(J.  1 1 _ — 

0 

/  NP— *  •  ProperNoun  |0. 0] 

/  S  —  •  NP  VP  |().0| 

ProperNoun  V.  N  ProperNoun 

0  Jen  1  saw  2  Bill 


Having  found  a  complete  constituent  (the  ProperNoun).  starling  in  position  2,  the 
parser  looks  in  row  2  to  find  rules  that  are  waiting  for  a  ProperNoun  starling  in  position 
2.  It  finds  one:  NP—*m  ProperNoun  [2, 2].  It  can  advance  that  rule's  •  and  add  the  rule 
/VP  -*•  ProperNuim  •  [2, 3]  to  row  3. This  rule  tells  the  parser  that  another  complete  con¬ 
stituent.  an  NP.  has  been  found,  starting  in  position  2.  So  it  again  looks  back  to  row  2  and 
finds  that  the  rule  VP  -*  V  •  NP  [1. 2]  is  looking  for  that  NP.  So  its  •  can  be  advanced, 
and  the  rule  VP—*V  NP  •  [1.3]  can  be  added  to  row  3. That  rule  describes  yet  another 
complete  constituent,  a  VP.  starting  back  in  position  1 .  So  the  parser  looks  back  at  row  1 
to  find  a  rule  that  is  waiting  for  that  VP.  It  finds  5  —  NP  •  VP  |0. 1].  So  its  •  can  be 
advanced,  and  the  rule  S—*NP  VP*  (0.3]  can  be  added  to  row  3.  Now  the  chart  is: 


mm 

✓ 

s- - 

VP - 

✓ 

✓ 

2 

/ 

NP  — *  •  ProperNoun  (2, 2]  ! 

✓ 

VP — 

/ 

v — 

1 

✓ 

VP-* 

n^vtUhil— PiB^Wjii  1 

/ 

✓ 

/ 

ProperNoun - >  Den 

0 

✓ 

NP—  •ProperNoun  |0.0) 

■■■  ■ 

✓ 

S—  •  NP  10.01 

■ 

ProperNoun  V,  N  PraperNoun 

0  Jen  1  saw  2  Bill  3 


At  this  point,  the  parsing  process  halts.  A  complete  S  that  spans  the  entire  input  has 
been  found.  In  this  simple  example,  there  is  only  one  parse.  Given  a  more  complex  sen¬ 
tence  and  a  more  realistic  grammar,  there  could  be  several  parses.  If  we  want  to  find 
them  all.  the  parser  can  be  allowed  to  continue  until  no  new  edges  can  be  added. 
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We  can  now  state  the  algorithm  that  we  have  just  described: 

Earleyparse{w:  input  string  containing  n  words,  G:  context-free  grammar)  = 

1.  For  every  rule  in  G  of  the  form  S—*a,  where  S  is  the  start  symbol  of  G, 
do:  /*  Initialize  chart. 

insert(chart,  S-*  •  a  [0, 0]).  I*  Insert  the  rule  S  — *  •  a  [0, 0]  into 

row  0  of  chart. 

Z .  For  i  =  0  to  n  do:  /*  Go  through  the  rows  one  at  a  time. 

For  each  rule  r  in  row,  of  chart  do: 

If  r  corresponds  to  finding  a  complete  constituent,  then 
extendothers(chart ,  r). 

Else  if  the  symbol  after  the  •  of  r  is  a  part  of  speech  tag,  then 
scaninput(w,  chart,  r). 

Else  predict(chart ,  r). 

insert(chart ,  r  [/,  /c]:  rule  that  spans  from  j  to  k  in  chart)  = 

If  r  is  not  already  on  chart,  spanning  from  j  to  k,  then  add  it  in  row  k. 

extendothers(chart:  chart,  r  [/,  k ]:  rule  of  the  form  A  —■ ►  a  •  that  spans  from  j  to  k  in 
chart)  = 

For  each  rule  p  of  the  form  X-+  (3  •  Ay[i,  j]  on  chart  do:  /*  Find  rules 

waiting  for  A  starting  at  j. 

insert(chart,  X  -*  (3A  •  y[i,  k]).  I *  Move  the  •  one  symbol  to  the  right 

and  add  rule  to  row*. 

scaninput{w:  input  string,  chart:  chart,  r  [/,  k]:  rule  of  the  form  A-*  1 3  •  Ay,  where  A 
is  a  part  of  speech  tag,  and  the  rule  spans  from  j  to  k  in  chart)  = 

If  wk  (the  k'h  word  of  the  input)  has  been  labeled  with  the  tag  A  then: 
insert{chart,  A  —  wk  •  [k,  k  +  1]).  /*  Add  this  one  to  the  next  row. 

predict(chart.  chart,  r  [/,  k]:  rule  of  the  form  A  -*  a  •  B(3  that  spans  from  j  to  k  in 
chart)  = 

For  each  rule  in  G  of  the  form  B-*y  do: 

insert(chart,  B  -*  •  y[k,  /:]).  /*  Try  to  find  a  B  starting  at  k. 

As  we  have  presented  it,  Earleyparse  doesn’t  actually  build  a  parse  tree.  It  simply 
decides  whether  a  parse  exists.  But  it  is  straightforward  to  modify  it  so  that  the  parse 
tree(s)  that  correspond  to  successful  S  rules  can  be  extracted. 

Notice  that  Earleyparse  avoids  the  two  major  pitfalls  of  the  more  straightforward 
top-down  parsing  algorithm  that  exploits  simple  depth-first  search.  First,  we  observe 
that  it  will  always  hall,  even  if  provided  with  a  grammar  that  contains  left-recursive 
rules.  This  must  be  true  because  a  rule  cannot  be  added  to  the  chart  at  a  given  location 
more  than  once.  Since  there  is  a  finite  number  of  rules  and  a  finite  number  of  locations 
in  the  chart,  only  a  finite  number  of  rules  can  be  placed  on  the  chart  and  Earleyparse 
terminates  after  it  has  processed  each  of  them. 

Second,  we  observe  that  Earleyparse  avoids  the  wasted  effort  of  backtracking 
search.  Instead  it  reuses  constituents.  How  it  does  so  may  not  have  been  obvious  in  the 
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very  simple  example  that  we  just  considered.  But  suppose  that  we  added  to  our  gram¬ 
mar  a  few  more  NP  rules,  the  necessary  prepositional  phrase  (PP)  rules  and  the  rule: 

VP  —  VP  PP 

Now  suppose  that  we  try  to  parse  the  sentence  Den  saw  Bill  through  the  window. 

A  backtracking,  top-down  parser  would  try  the  VP—»V  NP  rule  first  (assuming  it  was 
listed  first).  It  would  build  an  S  using  that  VP  and  then  realize  that  the  S  didn't  span  the 
entire  sentence.  So  it  would  back  up  and  throw  away  all  the  work  it  did  to  build  the  VP , 
including  building  the  NP  that  dominates  Bi  1 1 .  In  this  simple  example,  that  NP  doesn’t 
represent  a  lot  of  work,  but  in  a  less  trivial  sentence,  it  might.  Then  the  parser  would 
start  over  to  build  a  VP  using  the  new  rule  that  allows  for  a  prepositional  phrase. 
Earleypurse ,  on  the  other  hand,  will  build  each  of  those  constituents  once.  Since  rules 
are  never  removed  from  the  chart,  they  can  be  reused  as  necessary  by  other,  higher- 
level  rules.  We  leave  working  out  the  details  of  this  example  as  an  exercise. 

We  can  analyze  the  complexity  of  Eurleypnrse  as  follows: ‘Hie  loop  of  step  2  is  exe¬ 
cuted  n  times.  The  inner  loop  is  executed  once  for  each  rule  that  is  already  in  the  row. 
There  are  O(n)  of  them.  Whenever  extetuloihers  is  called,  it  must  compare  its  edge  to 
all  the  other  edges  in  the  row.  And  there  are  O(n)  of  them.  Multiplying  these  together, 
we  get  that  the  total  number  of  steps  is  O(tv).  If  we  want  to  consider  the  size  of  G.then 
let  |G|  be  the  number  of  rules  in  G.The  total  number  of  steps  executed  by  Earleyparse 
becomes  CD(/j3  •  |G|2). 

Exercises 

1.  Consider  the  following  grammar  that  we  presented  in  Example  15.9: 

S  —  AB$\AC$ 

A  -*  aA  |  a 
B  -*  bB  |  b 
C  — c 

Show  an  equivalent  grammar  that  is  LL(  1 )  and  prove  that  it  is. 

2.  Assume  the  grammar: 

S  -*  NP  VP 

NP  —*■  Proper  Noun 
NP  —  Dei  N 
VP  -*VNP 
VP  -*  VP  PP 
PP  —  Prep  NP 

Assume  that  Den  and  Bill  have  been  tagged  ProperNoun,  saw  has  been  tagged 
V.  through  has  been  tagged  Prep ,  the  has  been  tagged  Del.  and  wi  ndow  has  been 
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tagged  N. Trace  the  execution  of  Earleyparse  on  the  input  sentence  Jen  saw  Bill 
through  the  window. 

3.  Trace  the  execution  of  Earleyparse  given  the  string  and  grammar  of  Example  15.5. 

4.  Trace  the  execution  of  a  CKY  parser  on  the  input  string  id  +  id  *  id,  given  the 
unambiguous  arithmetic  expression  grammar  shown  in  Example  11.19,  by: 

a.  Converting  the  grammar  to  Chomsky  normal  form. 

b.  Showing  the  steps  of  the  parser. 


CHAPTER  16 


Summary  and  References 


The  theory'  of  context-free  languages  is  not  as  tidy  as  the  theory  of  regular  lan¬ 
guages.  Interesting  subsets,  including  the  deterministic  context-free  languages  and 
the  context-free  languages  that  are  not  inherently  ambiguous,  can  be  shown  to  be 
only  proper  subsets  of  the  larger  class  of  context-free  languages.  The  context-free  lan¬ 
guages  are  not  closed  under  many  common  operations. The  re  is  no  algorithm  for  minimiz¬ 
ing  PDAs.  There  is  no  fast  recognition  algorithm  that  works  for  arbitrary  context-free 
languages.  There  are  no  decision  procedures  for  many  important  questions.  Yet  substantial 
effort  has  been  invested  in  studying  the  context-free  languages  because  they  are  useful. 
The  results  that  we  have  presented  here  have  been  developed  by  many  people,  including 
theoreticians  (who  were  particularly  interested  in  the  formal  properties  of  the  set),  lin¬ 
guists  (who  were  interested  in  modeling  natural  languages  and  who  found  context-free 
grammars  to  be  a  useful  tool),  and  compiler  writers  (who  were  interested  in  building  effi¬ 
cient  parsers  for  programming  languages).  The  theory  that  w  as  developed  out  of  the  con¬ 
fluence  of  those  efforts  continues  to  provide  the  basis  for  practical  parsing  systems  today. 

Table  16.1  summarizes  the  properties  of  the  context-free  languages  and  compares 
them  to  the  regular  languages: 


References 

The  context-free  grammar  formalism  grew  out  of  the  efforts  of  linguists  to  describe  the 
structure  of  sentences  in  natural  languages  such  as  English.  By  the  mid  1940’s,  it  was 
widely  understood  thul  sentences  could  be  described  hierarchically,  with  a  relatively 
small  number  of  immediate  constituents  (or  ICs)  at  each  level.  For  example,  many  Eng¬ 
lish  sentences  can  be  described  as  a  noun  phrase  followed  by  a  verb  phrase.  Each  such 
1C,  until  the  smallest  ones,  could  in  turn  be  further  described  as  a  set  of  smaller  con¬ 
stituents.  and  so  forth.  [Chomsky  1956|  introduced  phrase  structure  (production-rule) 
grammars  as  a  way  to  describe  such  a  structural  analysis  of  a  sentence.  In  |Chomsky 
I959J,  Chomsky  defined  a  four-level  hierarchy  of  language  classes  based  on  the  form  of 
the  grammar  rules  that  are  allowed.  Context-free  grammars,  in  the  sense  in  which  we 
have  defined  them,  with  their  particular  restrictions  on  the  form  of  the  rules,  were  described 
there.  We'll  say  more  about  the  Chomsky  hierarchy  in  Section  24.2. 
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Table  16.1  Comparing  the  regular  and  the  context-free  languages. 

Regular 

Context-Free 

Automaton 

FSM 

PDA 

Grammar(s) 

Regular  grammar 
Regular  expressions 

Context-free  grammar 

ND  =  D? 

Yes 

No 

Closed  under. 

Concatenation 

Yes 

Yes 

Union 

Yes 

Yes 

Kleene  star 

Yes 

Yes 

Complement 

Yes 

No 

Intersection 

Yes 

No 

H  with  Regular 

Yes 

Yes 

Decidable: 

Membership 

Yes 

Yes 

Emptiness 

Yes 

Yes 

Finiteness 

Yes 

Yes 

=  2* 

Yes 

No 

Equivalence 

Yes 

No 

Chomsky  normal  form  was  also  introduced  in  [Chomsky  1959],  Greibach  normal 
form  was  introduced  in  [Greibach  1965]. Island  grammars  are  described  in  [Moonen 
2001]. There  are  many  other  applications  of  them  as  well  H.Two  early  uses  of  the  relat¬ 
ed  idea,  island  parsing,  are  described  in  [Carroll  1983]  and  [Stock,  Falcone  and  Insin- 
namo  1988].  For  a  discussion  of  stochastic  (probabilistic)  context-free  grammars  and 
parsing  techniques,  see  [Jurafsky  and  Martin  2000]. 

The  idea  of  using  a  pushdown  stack  to  process  naturally  recursive  structures,  like  for¬ 
mulas  in  logic  or  in  programming  languages,  was  developed  independently  by  many 
people  in  the  1950s.  (For  a  brief  discussion  of  this  history,  as  well  as  many  other  aspects 
of  the  theory  of  context-free  languages,  see  [Greibach  1981].)The  pushdown  automaton 
was  described  both  in  [Oettinger  1961]  (where  it  was  used  for  the  syntactic  analysis  of 
natural  language  sentences  as  part  of  a  machine  translation  system)  and  in  [Schutzen- 
berger  1963].  The  proof  of  the  equivalence  of  context-free  grammars  and  PDAs  ap¬ 
peared  independently  in  [Evey  1963],  [Chomsky  1962]  and  [Schutzenberger  1963]. 

Many  key  properties  of  context-free  languages,  including  the  Pumping  Theorem 
and  the  tact  that  the  context-free  languages  are  closed  under  intersection  with  the  reg¬ 
ular  languages,  were  described  in  [Bar-Hillel,  Perles  and  Shamir  1961],  Our  claim  that, 
if  a  grammar  G  is  not  self-embedding  then  L(G)  is  regular,  is  proved  in  [Hopcroft  and 
Ullman  1969],  The  fact  that  the  context-free  languages  are  closed  under  union  but  not 
under  intersection  and  complement  was  shown  in  [Scheinberg  I960],  Ogden’s  Lemma 
appeared  in  |Ogden  1968].  Parikh's  Theorem  was  presented  in  [Parikh  1966]. The  fact 
that  every  context-free  language  over  a  single  character  alphabet  is  regular  is  from 
[Ginsburg  and  Rice  1962],  For  a  comprehensive  treatment  of  the  mathematical  theory 
of  context-free  languages,  see  [Ginsburg  1966]. 
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[Parikh  1%6)  proved  lhal  there  exist  inherently  ambiguous  context-free  languages.lt 
showed  that  {a'b'a ‘'b1  :i.j.k.l  2  (J.  i  =  A  or  j  -  l\  is  inherently  ambiguous. The  claim 
lhal  {a'b'c* : /./.  A  2  0. 1  =  j  or  j  =  A}  is  inherently  ambiguous  is  proved  in  |Harrison 
197KJ  and  in  |Du  and  Ko  2001 1.  using  Ogden's  Lemma. 

The  proof  in  Example  11.10  is  taken  from  |  Martin  2003). 

See  |  Aho.  Sethi  and  Oilman  IONh|  for  a  more  detailed  treatment  of  the  issues  that  arise 
in  parsing  artificial  context-free  languages,  as  well  as  a  good  survey  of  the  history  of  the 
development  of  parsing  techniques,  including  the  definition  of  l.L(A)  and  LR(A)  lan¬ 
guages.  In  particular,  algorithms  for  computing  first  and  follow  and  for  eliminating  all  left 
recursion  are  presented  there.  Lex  is  described  in  |  Lesk  and  Schmidt  1 070 1.  Yacc  is  described 
in  |Johnson  1070). The  C'KY  algorithm  was  independently  discovered  by  John  Cocke. 
Daniel  Younger,  and  Tadao  Kasumi.und  described  in  |  Younger  I0f>7|  and  |Kasami  l%5]. 
[Valiant  1075|  showed  that  context-free  parsing  could  lie  recast  as  matrix  multiplication. 
Strassen’s  algorithm  for  fast  matrix  multiplication  was  described  in  |Strassen  l%0].The 
0(n2  }ln)  matrix  multiplication  algorithm  was  presented  in  [Coppersmith  and  Winograd 
1000],  The  claim  that  if  there  were  a  last  context-free  parsing  algorithm  there  would  be  a 
fast  algorithm  for  Boolean  matrix  multiplication  is  proved  in  |Lcc  2002).  The  claims  we 
made  about  the  L R ( A )  languages  are  proved  in  |l  lopcroll  and  l ’liman  |909|. 

Earley's  algorithm  for  general  top-down  parsing  was  first  presented  in  |  Earley  1970). 
The  version  given  here  is  patterned  after  the  one  in  |Juralsky  and  Marlin  2000). 


PART  IV 


TURING  MACHINES 
AND  UNDECIDABILITY 


We  are  about  to  begin  our  exploration  of  the  two  outer  circles  of  the  language 
hierarchy,  as  well  as  the  background,  the  area  outside  all  of  the  circles. 

Up  until  now,  we  have  been  placing  limitations  on  what  we  could  do  in 
order  that  we  could  discover  simple  solutions  when  they  exist. 


Now  we  are  going  to  tear  down  all  the  barriers  and  explore  the  full  power 


of  formal  computation.  We  will  discover  a  whole  range  of  problems  that 
become  solvable  once  we  do  that.  We  will  also  discover  that  there  are  funda- 


mental  limitations  on  what  we  can 
compute,  regardless  of  the  specific 
model  with  which  we  choose  to 
work. 


CHAPTER  17 


Turing  Machines 


We  need  a  new  kind  of  automaton  that  lias  two  properties: 

•  It  must  be  powerful  enough  to  describe  all  computable  things.  In  this  respect,  it  should 
be  like  real  computers  and  unlike  FSMs  and  PDAs. 

•  It  must  be  simple  enough  that  we  can  reason  formally  about  it.  In  this  respect,  it 
should  be  like  FSMs  and  PDAs  and  unlike  real  computers. 


17.1  Definition,  Notation  and  Examples 

In  our  discussion  of  pushdown  automata,  it  became  clear  that  a  finite  stale  controller 
augmented  by  a  single  stack  was  not  powerful  enough  to  be  able  to  execute  even  some 
verv  simple  programs.  What  else  must  be  added  in  order  to  acquire  the  necessary 
power?  One  answer  is  a  second  slack.  We  will  explore  that  idea  in  Section  17.5.2. 


17.1.1  What  Is  a  Turing  Machine? 

A  more  straightforward  approach  is  to  eliminate  the  stack  and  replace  it  by  a  more 
flexible  form  of  infinite  storage,  a  writeable  tape.  When  we  do  that,  we  gel  a  Turing 
machine  tJ.  Figure  17.1  show  ns  a  simple  schematic  diagram  of  a  l  uring  machine  M. 
M's  tape  is  infinite  in  both  directions. The  input  to  A/  is  written  on  the  tape,  one  charac¬ 
ter  per  square,  before  M  is  started.  All  other  squares  of  the  tape  are  initially  blank  (□). 
As  wc  have  done  for  both  FSMs  and  PDAs.  M's  behavior  will  be  defined  only  on  input 
strings  that  are  finite  and  contain  only  characters  in  M's  input  alphabet. 

M  has  a  single  read/wrile  head. shown  here  with  an  arrow.  We  will  almost  always  use 
the  convention  that,  when  M  starts,  its  read/wrile  head  will  be  over  the  blank  immedi¬ 
ately  to  the  left  of  the  leftmost  character  of  the  input.  Hut  occasionally,  when  we  are 
designing  a  machine  to  be  used  as  a  subroutine  by  some  other  machine,  we  may 
choose  a  different  initial  specification. 
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Finite  State  Controller 

... 


FIGURE  17.1  '  rite  structure  of  a  Turing  machine. 


M  begins  in  its  start  state.  At  each  step  of  its  operation.  M  must: 

•  choose  its  next  state, 

•  write  on  the  current  square,  and 

•  move  the  read/write  head  left  or  right  one  square. 

M  can  move  back  and  forth  on  its  tape,  so  there  is  no  longer  the  idea  that  it  con¬ 
sumes  all  of  its  input  characters  one  at  a  time  and  then  halts.  M  will  continue  to  execute 
until  it  reaches  a  special  stale  called  a  halting  state.  It  is  possible  that  M  may  never 
reach  a  halting  state,  in  which  case  it  will  execute  forever. 

Notice  that  there  will  always  be  a  finite  number  of  nonblank  squares  on  M's  tape. 
This  follows  from  the  fact  that,  before  M  starts,  only  a  finite  number  of  squares  are 
nonblank.  Then,  at  each  step  of  its  operation,  M  can  write  on  at  most  one  additional 
square.  So,  after  any  finite  number  of  steps,  only  a  finite  number  of  squares  can  be  non¬ 
blank.  And.  even  if  M  never  halts,  at  any  point  in  its  computation  it  will  have  executed 
only  a  finite  number  of  steps. 

In  Chapter  1 8  we  are  going  to  argue  that  the  Turing  machine,  as  we  have  just  described 
it,  is  as  powerful  as  any  other  reasonable  model  of  computation,  including  modern  com¬ 
puters.  So,  in  the  rest  of  this  discussion,  although  our  examples  will  be  simple,  remember 
that  we  are  now  talking  about  computation,  broadly  conceived. 

We  are  now  ready  to  provide  a  formal  definition.  A  Turing  machine  (or  TM)  M  is  a 
sixluple  (K,  2,  T,  5,  s,  H)%  where: 

•  K  is  a  finite  set  of  states, 

•  2  is  the  input  alphabet,  which  does  not  contain  □, 

•  P  is  the  tape  alphabet,  which  must,  at  a  minimum,  contain  □  and  have  2  as  a  subset, 

•  s  b  K  is  the  start  state, 

•  H  C  K  is  the  set  of  halting  states,  and 

•  8  is  the  transition  function.  It  maps  from: 

(K-H)  x  T  to  K  X  r  X 
non-halting  stale  X  tape  character  slate  X  tape  character  X  action(R  or  L) 
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If  5  contains  the  transition  ((<;„.  a).(t/,.  b.A ))  ihcn.  whenever  M  is  in  stale  </„  and  the 
character  under  the  rcad/write  head  is  a.  M  will  go  to  stale  </,.  write  b.  and  then  move 
the  read/write  head  as  specified  by  A  (either  one  squat e  in  the  light  or  one  square  to 
the  left). 

Notice  that  the  tape  symbol  J  is  special  in  two  ways: 

•  Initially,  all  tape  squares  except  those  that  contain  the  input  string  contain  □. 

•  The  input  string  may  not  contain  J. 

But  those  are  the  only  ways  in  which  _1  is  special.  A  luring  machine  may  write  □just 
as  it  writes  any  other  symbol  in  its  tape  alphabet  I’.  Be  careful,  though,  if  you  design  a 
Turing  machine  M  that  does  w'rite  J.  Make  sure  that  M  can  tell  the  difference  between 
running  off  the  end  of  the  input  and  hitting  a  patch  of  J's  within  the  part  of  the  tape  it 
is  working  on.  Some  books  use  a  definition  of  a  Turing  machine  that  does  not  allow 
writing  □.  But  we  allow-  it  because  it  can  be  quite  useful  if  you  are  careful.  In  addition, 
this  definition  allows  a  Turing  machine  to  output  a  string  that  is  shorter  than  its  input 
by  writing  □’$  as  necessary. 

Define  the  active  tape  of  a  Turing  machine  M  to  be  the  shortest  fragment  of  Afs 
tape  that  includes  the  square  under  the  read/write  head  and  all  the  nonblank  squares. 

We  require  that  8  be  defined  for  all  (stale,  input)  pairs  unless  the  state  is  a  halting 
state.  Notice  that  5  is  a  function,  not  a  relation.  So  this  is  a  definition  for  deterministic 
Turing  machines. 

One  other  important  observation:  A  Turing  machine  can  produce  output,  namely 
the  contents  of  its  tape  when  it  halts. 


EXAMPLE  17.1  Add  b's  to  Make  Them  Match  the  a's 


Design  a  Turing  machine  M  that  takes  as  input  a  string  in  the  language 
{a'b* : 0  s  /  s  /}  and  adds  b’s  as  required  to  make  the  number  of  b’s  equal  the 
number  of  a's. The  input  to  M  will  look  like  this: 


On  that  input,  the  output  (the  contents  of  the  tape  when  M  halts)  should  be: 


M  will  operate  as  follows: 


1.  Move  one  square  to  the  right.  If  the  character  under  the  read/write  head  is 
□,  halt.  Otherwise,  continue. 
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2.  Loop: 

2.1.  Mark  off  an  a  with  a  $. 

2 JL  Scan  rightward  to  the  first  b  or  □. 

•  If  b,  mark  it  off  with  a  #  and  get  ready  to  go  back  and  find  the  next 
matching  a,  b  pair. 

•  If  □.  then  there  are  no  more  b’s  but  there  are  still  a’s  that  need 
matches.  So  it  is  necessary  to  write  another  b  on  the  tape.  But  that  b 
must  be  marked  so  that  it  cannot  match  another  a.  So  write  a  #. 
Then  get  ready  to  go  back  and  look  for  remaining  unmarked  a’s. 

23.  Scan  back  leftward  looking  for  a  or  □.  If  a,  then  go  back  to  the  top  of 
the  loop  and  repeat.  If  □,  then  all  a’s  have  been  handled.  Exit  the  loop. 
(Notice  that  the  specification  for  M  guarantees  that  there  will  not  be 
more  b's  than  a’s.) 

3.  Make  one  last  pass  all  the  way  through  the  nonblank  area  of  the  tape,  from 

left  to  right,  changing  each  $  to  an  a  and  each  #  to  a  b. 

4.  Halt. 


M  =  ({ 1, 2, 3, 4, 5, 6},  {a,  b},  {a,  b, □,$,#},  8,1,  {6}  ),  where  8  = 


( 


<(1,0).  (2,3. 

((1- a),  (2,  q,  ■ 

-)> 

((l.b).  (2,  q, 

-»,  . 

((!.$).  (2,3. 

-)). 

((1,#),  (2.  □, 

((2,Q),  (6,$, 

((2,  a),  (3,  $, 

-)). 

((2,  b),(3,  $, 

((2,  S),  (3,  $, 

-))>  ’ 

((2.#),  (3.$, 

((3,  □),  (4,  #, 

((3,  a),  (3,  a,  -*•  )), 

((3,  b),  (4,  #, 

((3,  $),  (3,  $, 

((3,#),  (3,#, 

->)). 

((4,  □),  (5,  □ 

((4,  a).  (3,  $, 

-)). 

((4.$),  (4.$, 

-)), 

((4,#),  (4,#, 

((5,  □),  (6,  □ 

1.  -)).  } 

((5,$),(5,a, 

-)), 

((5,#),(5,b, 

-))  ) 

These  four  transitions  are  required 
because  M  must  be  defined  for  every 
state/  input  pair,  but  since  it  isn’t  possi¬ 
ble  to  see  anything  except  □  in  state 
.  1,  it  doesn’t  matter  what  they  do. 


|  Three  more  unusable  elements  of  S. 
1  We’ll  omit  the  rest  here  for  clarity. 


State  6  is  a  halting  state  and  so  has  no 
transitions  out  of  it 
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People  find  it  nearly  impossible  lo  read  transition  tables  like  this  one.  even  for  very 
simple  machines  So  we  will  adopt  a  graphical  notation  similar  to  the  one  wc  used  for 
both  FSMs  and  PDAs.  Since  each  element  of  ft  has  five  components,  we  need  a  nota¬ 
tion  for  labeling  arcs  that  includes  all  the  required  information.  Let  on  an  arc  of  M 

mean  that  the  transition  can  be  taken  if  the  character  currently  under  the  read/write 
head  is.v.  If  it  is  taken,  write  /  and  then  move  the  read/wrile  head  as  specified  by  a.  We 
will  also  adopt  the  convention  that  we  will  omit  unusable  transitions,  such  as((l.«). 
(2.Q  — *  ))  in  the  example  above,  from  our  diagrams  so  that  they  are  easier  to  read. 


EXAMPLE  17.2  Using  the  Graphical  Language 

Here  is  a  graphical  description,  using  the  notation  we  just  described,  of  the  ma¬ 
chine  from  Example  17.1: 


17.1.2  Programming  Turing  Machines 

Although  there  is  a  lot  less  practical  motivation  for  learning  to  program  a  Turing 
machine  than  there  is  for  learning  to  build  bS.Ms.  regular  expressions,  context-free 
grammars,  and  parsers,  it  is  interesting  lo  see  how  a  device  that  is  so  simple  can  actual¬ 
ly  be  made  to  compute  whatever  we  can  compute  using  the  fastest  machines  in  our  labs 
today.  It  seems  to  highlight  the  essence  of  what  it  takes  to  compute. 

In  Chapter  18,  we  will  argue  that  anything  computable  can  be  computed  by  a  Turing 
machine.  So  we  should  not  expect  lo  find  simple  nuggets  that  capture  everything  a  Turing 
machine  programmer  needs  to  know.  But.  at  least  for  the  fairly  straightforward  language 
recognition  problems  that  we  will  focus  on.  there  are  a  few  common  programming  idioms. 
The  example  we  have  just  shown  illustrates  them: 

•  A  computation  will  typically  occur  in  phases:  When  phase  l  finishes,  phase  2  begins, 
and  so  forth. 
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•  One  phase  checks  for  corresponding  substrings  by  moving  back  and  forth,  marking 
off  corresponding  characters. 

•  There  are  two  common  ways  to  go  back  and  forth.  Suppose  the  input  string  is 
aaaabbbb  and  we  want  to  mark  off  the  a's  and  make  sure  they  have  corresponding 
b's.  Almost  any  sensible  procedure  marks  the  first  a, scans  right, and  marks  the  first 
b.  There  are  then  two  ways  to  approach  doing  ihe  rest.  We  could: 

•  Scan  left  to  the  first  a  we  find  and  process  the  rest  of  the  a’s  right  to  left.  That  is 
the  approach  we  look  in  Example  17.1  discussed  previously. 

•  Scan  all  the  way  left  until  we  find  the  first  marked  a.  Bounce  back  one  square  to 
the  right  and  mark  the  next  a.  In  this  approach,  we  process  all  the  a's  left  to  right. 

Both  ways  work.  Sometimes  it  seems  easier  to  use  one,  sometimes  the  other. 

•  If  wc  care  ahoul  the  machine’s  output  (as  opposed  to  caring  just  about  whether  it 
accepts  or  rejects),  then  there  is  a  final  phase  that  makes  one  last  pass  over  the  tape 
and  converts  the  marked  characters  back  to  their  proper  form. 

17.1.3  Halting 

Wc  make  the  following  important  observations  about  the  three  kinds  of  automata  that 
we  have  so  far  considered: 

•  A  DFSM  M.  on  input  u\  is  guaranteed  to  halt  in  |u’|  steps.  We  proved  this  result  as 
Theorem  5.1.  An  arbitrary  NDFSM  can  be  simulated  by  nclfsmsinwlate  and  that 
simulation  will  also  hall  in  |u*|  steps. 

•  An  arbitrary  PDA.  on  input  «>.is  not  guaranteed  to  halt.  Rut.  as  we  saw  in  Chapter  14. 
for  any  context-free  language  L  there  exists  a  PDA  M  that  accepts  L  and  that  is 
guaranteed  to  halt. 

•  A  Turing  machine  Af.on  input  «\is  not  guaranteed  to  halt.  It  could,  instead,  bounce 
back  and  forth  forever  on  its  tape.  Or  it  could  just  blast  its  way.  in  a  single  direction, 
through  the  input  and  off  forever  into  the  infinite  sequence  of  blanks  on  the  tape. 
And  now.  unlike  with  PDAs,  there  exists  no  algorithm  to  find  an  equivalent  Turing 
machine  that  is  guaranteed  to  halt. 

Utis  fundamental  properly  of  Turing  machines,  that  they  cannot  be  guaranteed  to 
halt,  will  drive  a  good  deal  of  our  discussion  about  them. 


17.1.4  Formalizing  the  Operation  of  a  Turing  Machine 

In  this  section  we  will  describe  formally  the  computation  process  that  we  outlined  in 
the  last  section. 

A  configuration  of  a  Turing  machine  M  =  ( K ,  2.  r.  5.  s,  H)  is  a  4-tuple  that  is  an 
clement  of: 


k  x  «r-  {j})r*)U{e}  x  r  x 

slate  includes  all  of  M's  active  square  under  the 
tape  to  the  left  of  the  read/wrile  head 
read/write  head 


<r*ir-  {u}))U{£}. 

includes  all  of  M's  active 
tape  to  the  right  of  the 
read/write  head 
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Notice  that,  although  M's  tape  is  infinite,  the  description  of  anv  configuration  is 
finite  because  we  include  in  that  description  the  smallest  contiguous  tape  fragment 
that  includes  all  the  nonblank  squares  and  the  square  under  the  read/write  head. 

We  will  use  the  following  shorthand  for  configurations:  (</.  .v,.  a.  v:)  will  be  written 
as  (r/.  .v, {/.*,). 

The  initial  configuration  of  any  Turing  machine  M  with  start  state  s  and  input  it’  is 
(s.  J  w).  Any  configuration  whose  state  is  an  element  of  II  is  a  halting  configuration. 


EXAMPLE  17.3  Using  the  4-Tuple  Notation  and  the  Shorthand 

As  u  4-tuple  Shorthand 


i 

□ 

a 

b 

b 

b 

J 

J 

try.  ab.  b.  b)  (</.  abbb) 

! 

... 

□ 

a 

a 

b 

b 

□ 

M  t 

tr/.e.  J.  aabb)  (</.Jabbb) 

I 


The  transition  function  defines  the  operation  of  a  luring  machine  M  one  step  at  a 
lime.  We  can  use  it  to  define  the  sequence  of  configurations  that  M  w  ill  enter.  We  start 
by  defining  the  relation  yields-in-one-step .  written  I  -  w.  which  relates  configuration  cj 
to  configuration  c2  iff  M  can  move  from  configuration  i  t  to  configuration  c2  in  one 
step.  So.  just  as  we  did  with  FSMs  and  PDAs,  we  define: 

[<Ju  (</2-  ,('z)  iff  (ll2'  M'2)  >s  derivable,  via  in  one  step. 

We  can  now  define  the  relation  yields ,  written  |-A,\  to  be  the  reflexive,  transitive 
closure  of  |-,M.  So  configuration  C)  yields  configuration  if: 

C I  I  ~\1*  C2. 

A  path  through  M  is  a  sequence  of  configurations  C„.  C,.  Q....such  that  C„  is  an 
initial  configuration  of  M  and: 

Ci  I- w  C  |  |- M  C2  | - •  • . 

A  computation  by  M  is  a  path  that  halts.  So  it  is  a  sequence  of  configurations 
Co,  C|,...,C„  for  some  n  S  ().  such  that  CM  is  an  initial  configuration  of  M,C„  is  a  halt¬ 
ing  configuration  of  M.  and: 

Ci l— m  C t  I ~M  C2|-w...|-M  C„. 

If  a  compulation  halts  in  n  steps,  we  will  say  that  it  has  length  ;i  and  we  will  write: 

Ci  l~u"  C„ 
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17.1.5  A  Macro  Notation  for  Turing  Machines 

Writing  even  very  simple  Turing  machines  is  time  consuming  and  reading  a  description 
of  one  and  making  sense  of  it  is  even  harder.  Sometimes  we  will  simply  describe,  at  a 
high  level,  how  a  Turing  machine  should  operate.  But  there  are  times  when  we  would 
like  to  be  able  to  specify  a  machine  precisely.  So.  in  this  section,  we  present  a  macro 
language  that  will  make  the  task  somewhat  easier.  If  you  don’t  care  about  the  details  of 
how  Turing  machines  work,  you  can  skip  this  section.  In  most  of  the  rest  of  our  examples, 
we  will  give  the  high  level  description  first,  followed  (when  it's  feasible)  by  a  description  in 
this  macro  language. 

The  key  idea  behind  this  language  is  the  observation  that  we  can  combine  smaller 
Turing  machines  to  build  more  complex  ones.  We  begin  by  defining  simple  machines 
that  perform  the  basic  operations  of  writing  on  the  tape,  moving  the  read/write  head, 
and  halting: 

•  Symbol  writing  machines:  For  each  .veT,  define  Mx,  written  just  x,  to  be  a  Turing 
machine  that  writes  .v  on  the  current  square  of  the  tape  and  then  halts.  So,  if  T  = 

{ a,  b,  J } ,  there  will  be  three  simple  machines:  a.  b,  and  J.  (A  technical  note:  Given 
our  definition  of  a  Ttiring  machine,  each  of  these  machines  must  actually  make  two 
moves.  In  the  first  move,  it  writes  the  new  symbol  on  the  tape  and  moves  right.  In  the 
next  move,  it  rewrites  whatever  character  was  there  and  then  moves  left.  These  two 
moves  are  necessary  because  our  machines  must  move  at  each  step.  But  this  is  a  detail 
with  which  we  do  not  want  to  be  concerned  w  hen  we  are  writing  Turing  machine 
programs. This  notation  hides  it  from  us.) 

•  Head  moving  machines:  There  are  two  of  these:  R  rewrites  whatever  character  was 
on  the  tape  and  moves  one  square  to  the  right.  L  rewrites  whatever  character  was 
on  the  tape  and  moves  one  square  to  the  left. 

•  Machines  that  simply  halt:  Each  of  our  machines  halls  when  it  has  nothing  fur¬ 
ther  to  do  (i.e.,  it  has  entered  a  stale  on  which  5  is  undefined),  but  there  are 
times  when  we'll  need  to  indicate  halting  explicitly.  We  will  use  three  simple  halt¬ 
ing  machines: 

•  /»,  which  simply  halts.  We  will  use  h  when  we  want  to  make  it  clear  that  some 
path  of  a  machine  halls,  but  we  do  not  care  about  accepting  or  rejecting. 

•  «.  which  halts  and  rejects. 

•  y,  which  halts  and  accepts. 

Next  we  need  to  describe  how  to: 

•  Check  the  tape  and  branch  based  on  what  character  we  see,  and 

•  Combine  the  basic  machines  to  form  larger  ones. 

We  can  do  both  of  these  things  with  a  notation  that  is  very  similar  to  the  one  wre 
have  used  for  all  or  our  slate  machines  so  far.  We  will  use  two  basic  forms: 

•  M\M<  Begin  in  the  start  state  of  Af  |.  Run  My  until  it  halts.  If  it  does,  begin  M2  in  its 
start  slate  (without  moving  the  read/write  head)  and  run  M2  until  it  halts.  If  it  does, 
then  halt.  If  either  My  or  M2  fails  to  halt,  MyM:  will  fail  to  halt. 
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•  M\.  t  tniihnuii  i  M2:  Begin  in  the  start  stale  of  A/f.  Run  A/(  until  it  halts.  If  it  does, 
check  condition.  If  it  is  true,  then  begin  V/:  in  its  start  state  (without  moving  the 
read/write  head)  and  run  Mz  until  it  halts. The  simplest  condition  will  be  the  pres¬ 
ence  of  a  specific  character  under  the  read/wrile  head,  although  we  will  introduce 
some  others  as  well.  A  machine  w  ith  this  structure  w  ill  fail  to  hall  if  either: 

•  /W|  fails  to  halt,  or 

•  condition  is  true  and  M2  fails  to  hall. 

We  will  use  the  symbol  >  to  indicate  where  the  combination  machine  begins. 

EXAMPLE  17.4  The  Macro  Language  Lets  Machines  be  Composed 
Let  M  = 

>Af,  a  M; 

b 

Ms 


So  M: 

•  Starts  in  the  start  state  of  M\. 

•  Computes  until  M  \  reaches  a  halting  state. 

•  Examines  the  tape.  If  the  current  symbol  is  a.  then  it  transfers  control  to  A/2.  If 
the  current  symbol  is  b,  it  transfers  control  to  My 

To  make  writing  our  machines  a  bit  easier,  we  introduce  some  shorthands: 


a 


M , 

V, 

becomes 

w. 

a.  b 

Mz 

b 

M, 

all  elements  of  l‘  W; 

becomes 

v, 

-a 

Mz 

except a 


Next  we  provide  a  simple  mechanism  for  storing  values  in  variables.  Each  variable 
will  hold  just  a  single  character.  A  standard  luring  machine  can  remember  values  for 
any  finite  number  of  such  variables  either  by  writing  them  someplace  on  its  tape  or  bv 
branching  to  a  different  state  for  each  possible  value.  This  second  solution  avoids  hav¬ 
ing  to  scan  back  and  forth  on  the  tape,  hut  it  can  lead  to  an  explosion  in  the  number  of 
slates  since  there  must  be  effectively  a  new  copy  of  the  machine  for  each  combination 
of  values  that  a  set  of  variables  can  have.  We  will  hide  the  mechanism  by  w  hich  vari¬ 
ables  are  implemented  by  allowing  them  to  be  named  and  explicitly  referenced  in  the 
conditions  on  the  arcs  of  our  machines.  So  we  have: 
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Afi  all  elements  of  r  Mz  becomes  M{  x*-  -a  M2 

except  a 

and  x  takes  on  the  value  of  the  current  square. 
M\ _ aj) _  A#2  becomes  M{  x<-  a.  b  M2 

and  x  takes  on  the  value  of  the  current  square. 

We  can  use  the  value  of  a  variable  in  two  ways. The  first  is  as  a  condition  on  a  transi¬ 
tion.  So  we  can  write: 

M\  x  =  y  Mi 

if  x  =  y  then  take  the  transition. 

Note  that  we  use  <—  for  assignment  and  =  and  *  for  Boolean  comparison.  We  can 
also  write  the  value  of  a  variable.  We  ll  indicate  that  with  the  variable’s  name. 


EXAMPLE  17.5  Using  Variables  to  Remember  Single  Characters 
Let  M= 

>  -T<—  -/□  R..V 

If  the  current  square  is  not  blank,  M  remembers  its  value  in  the  variable  x,  goes 
right  one  square,  and  copies  it  by  writing  the  value  of  x.  (If  the  current  square  is 
blank,  M  has  nothing  to  do.  So  it  halts.) 

Next  we  define  some  very  useful  machines  that  we  can  build  from  the  primitives  we 
have  so  far: 

>  r  J  Move  right.  If  the  character  under  the  read/write  head  is  not  O.  repeal.  If 

^  it  is  □.  no  further  action  is  specified,  so  halt.  In  other  words,  find  the  first 

blank  square  to  the  right  of  the  current  square.  We  will  abbreviate  this  R3 

>  L  J  ->□  Move  left.  If  the  character  under  the  read/write  head  is  not  □.  repeat.  If 

it  is  J,  no  further  action  is  specified,  so  halt.  In  other  words,  find  the  first 
blank  square  to  the  left  of  the  current  square.  We  will  abbreviate  this  L 

>  R  J  □  Similarly,  but  find  the  first  nonblank  square  to  the  right  of  the  current 

square.  We  will  abbreviate  this 

>  t.  J  □  Similarly,  but  find  the  first  nonblank  square  to  the  left  of  the  current 

square.  We  will  abbreviate  this  j. 
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We  can  do  the  same  thing  we  have  just  done  for  J  with  any  other  character  in  P.  So 
we  can  write: 

La  Find  the  first  occurrence  of  a  to  the  left  of  the  current  square. 

R**  Find  the  first  occurrence  of  a  or  b  to  the  right  of  the  current  square. 

L,  h  a  W,  Find  the  first  occurrence  of  a  or  b  to  the  left  of  the  current  square. 

b  *"  then  go  to  A/,  if  the  detected  character  is  a:  go  to  A/>  if  the  detected 
u  character  is  b. 

«W; 

Lv_a  h  Find  the  first  occurrence  of  a  or  b  to  the  left  of  the  current  square  and 

set  ,v  to  the  value  found. 

L,_„  |,Rjr  Find  the  first  occurrence  of  a  or  b  to  the  left  of  the  current  square. set 
.v  to  the  value  found,  move  one  square  to  the  right,  and  write  x  (a  or  b). 


EXAMPLE  17.6  Triplicating  a  String 


We  wish  to  build  M  with  the  following  specification:  Input: 

Output: 


J«3 


Example:  Input:  Jill  Output:  Jlllllllll 


M  will  operate  as  follows  on  input  tv: 


1.  Loop 

1.1.  Move  right  to  the  first  1  or  J. 

1.2.  If  the  current  character  is  J.  all  the  Is  have  been  copied.  Exit  the 
loop.  Otherwise  the  current  character  must  be  a  1.  Mark  it  off  with  # 
(so  it  won't  gel  copied  again),  move  right  to  the  first  blank. and  write 
two  more  #'s. 


13.  Go  left  back  to  the  blank  in  front  of  the  siring. 

2.  Make  one  final  pass  through  the  string  converting  the  #’s  back  to  l's 


EXAMPLE  17.7  Shifting  Left  One  Square 

We  wish  to  build  a  shifting  machine  S_  with  the  following  specification,  where  u 
and  w  are  strings  that  do  not  contain  any  J's: 
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Input: 

Output :  Lk<  tej 

Example:  Input:  11J00 

Output:  1100J 

moves  left  to  right  through  w,  copying  each  character  onto  the  square  imme¬ 
diately  to  its  left: 

f  I 

>  R  X«-  V_1  □  LvR 

- ► 

a 


17.2  Computing  With  Turing  Machines 

Now  that  we  know'  how  Hiring  machines  work,  we  can  describe  how  to  use  a  Turing 
machine  to: 

•  recognize  a  language,  or 

•  compute  a  function. 


17.2.1  Turing  Machines  as  Language  Recognizers 

Given  a  language  L,  we  would  like  to  be  able  to  design  a  Turing  machine  M  that  takes 
as  input  (on  its  tape)  some  string  w  and  tells  us  whether  or  not  tv  e  L.  There  are  many 
languages  for  which  it  is  going  to  he  possible  to  do  this.  Among  these  are  all  of  the 
noncontext-frec  languages  that  we  discussed  in  Part  III  (as  well  as  all  the  regular  and 
context-free  languages  for  which  we  have  built  FSMs  and  PDAs). 

However,  as  we  will  see  in  Chapter  Id  and  others  that  follow  it,  there  are  many  lan¬ 
guages  for  which  even  the  power  of  the  Turing  machine  is  not  enough.  In  some  of  those 
cases,  there  is  absolutely  nothing  better  that  we  can  do.  There  exists  no  Turing  machine 
that  can  distinguish  between  strings  that  are  in  L  and  strings  that  are  not.  But  there  are 
other  languages  for  which  we  can  solve  part  of  the  problem.  For  each  of  these  lan¬ 
guages  wc  can  build  a  Hiring  machine  M  that  looks  for  the  property  P  (whatever  it  is) 
of  being  in  L.  If  M  discovers  that  its  input  possesses  P,  it  halts  and  accepts.  But  if  P  does 
not  hold  for  some  input  string  u>,ihen  M  may  keep  looking  forever.  It  may  not  be  able 
to  tell  that  it  is  not  going  to  find  P  and  thus  it  should  halt  and  reject. 

In  this  section  we  will  define  what  it  means  for  a  Turing  machine  to  decide  a  lan¬ 
guage  (i.e..  for  every  string,  accept  or  reject  as  appropriate)  and  for  a  Turing  machine  to 
semidocide  a  language  (i.e..  to  accept  when  it  should). 
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Deciding  a  Language 

Let  M  be  a  Turing  machine  with  start  state  s  and  two  halting  states  that  we  will  call  y 
and  n.  Let  w  be  an  element  of  £*.  Then  w-e  will  say  that: 

•  M  accepts  tv  iff  (s.  Jn  )  |  -  w*  (v.  tv')  for  some  string  tv'.  We  call  any  configuration 
(y.  tv' )  an  accepting  configuration. 

•  M  rejects  tv  iff  (s.  □»<’)  \—\t*  (n,  tv')  for  some  string  tv’.  We  call  any  configuration 
(n,  tv')  a  rejecting  configuration. 

Notice  that  we  do  not  care  what  the  contents  of  M‘ s  tape  are  w  hen  it  halts.  Also  note 
that  if  M  does  not  halt,  it  neither  accepts  nor  rejects. 

Let  £  be  the  input  alphabet  of  M.  Then  M  decides  a  language  L  C  £*  iff,  for  any 
string  it>e  2*.  it  is  true  that: 

•  If  M>  e  L  then  M  accepts  w.  and 

•  If  tv  &  L  then  M  rejects  w. 

Since  every  string  in  £*  is  either  in  L  or  not  in  /..any  deciding  machine  M  must  halt 
on  all  inputs.  A  language  L  is  decidable  iff  there  is  a  Turing  machine  M  that  decides  it. 

We  define  the  set  D  to  be  the  set  of  all  decidable  languages.  So  a  language  L  is  in  D 
iff  there  is  a  Turing  machine  that  decides  it.  In  some  books,  the  set  D  is  called  R.  or  the 
set  of  recursive  languages. 


EXAMPLE  17.8  AnBnCn 

Recall  the  language  AnBnCn  =  { a"b"c" :  n  >  0},  which  we  showed  was  not 
context-free  and  so  could  not  be  recognized  with  a  PDA.  A"B"C"  is  decidable. 
We  can  build  a  straightforward  Turing  machine  M  to  decide  it.  M  will  work  as 
follows  on  input  w : 

1.  Move  right  onto  w.  If  the  first  character  is  □.  halt  and  accept. 

2.  Loop: 

2.1.  Mark  off  an  a  with  a  1. 

2.2.  Move  right  to  the  first  b  and  mark  it  off  with  a  2.  If  there  isn't  one,  or  if 
there  is  a  c  first,  halt  and  reject. 

2.3.  Move  right  to  the  first  c  and  mark  it  off  with  a  3.  If  there  isn't  one,  or  if 
there  is  an  a  first,  halt  and  reject. 

2.4.  Move  all  the  way  back  to  the  left,  then  right  again  past  all  the  Is  (the 
marked  off  a's).  If  there  is  another  a.  go  back  to  the  lop  of  the  loop.  If 
there  isn’t,  exit  the  loop. 
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3.  All  a’s  have  found  matching  b’s  and  c’s  and  the  read/write  head  is  just  to 
the  right  of  the  region  of  marked  off  a  s.  Continue  moving  left  to  right  to 
verify  that  all  b’s  and  c’s  have  been  marked.  If  they  have,  halt  and  accept. 
Otherwise  halt  and  reject. 

In  our  macro  language,  M  is: 


a,2Q  b.jQ 

a  1R  b  2  R  c 


EXAMPLE  17.9  WcW 

Consider  again  WcW  =  {wcw :  w  e  {a,  b}*}.  We  can  build  M  to  decide  WcW  as 
follows: 

1.  Loop: 

1.1.  Move  right  to  the  first  character.  If  it  is  c,  exit  the  loop.  Otherwise,  over¬ 
write  it  with  □  and  remember  what  it  is. 

1.2.  Move  right  to  the  c.Then  continue  right  to  the  first  unmarked  character. 
If  it  is  □,  halt  and  reject.  (This  will  happen  if  the  string  to  the  right  of  c  is 
shorter  than  the  string  to  the  left.)  If  it  is  anything  else,  check  to  see 
whether  it  matches  the  remembered  character  from  the  previous  step.  If 
it  does  not,  halt  and  reject.  If  it  does,  mark  it  off  with  #. 

13.  Move  back  leftward  to  the  first  □. 

2.  There  are  no  characters  remaining  before  the  c.  Make  one  last  sweep  left  to 
right  checking  that  there  are  no  unmarked  characters  after  the  c  and  before 
the  first  blank.  If  there  are,  halt  and  reject.  Otherwise,  halt  and  accept. 
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EXAMPLE  17.9  ( Continued ) 

In  our  macro  language,  M  is: 


Semideciding  a  Language 

Let  £  be  the  input  alphabet  to  a  Turing  machine  M.  Let  I.  C  2*.  Then  we  will  say  that 
M  semidecides  L  iff,  for  any  string  w  e  -*"■ 

•  If  w  e  L  then  M  accepts  u\  and 

•  If  w  <z  L  then  M  does  not  accept  tr.  In  this  case.  M  may  explicitly  reject  or  it  may  loop. 

A  language  L  is  semidecidahle  iff  there  is  a  Turing  machine  that  semidecides  it.  We 
define  the  set  SD  to  be  the  set  of  all  semidecidahle  languages.  So  a  language  L  is  in  SD  iff 
there  is  a  Turing  machine  that  semidecides  it.  In  some  books,  the  set  SD  is  called  RE,  or 
the  set  of  recursively  enumerable  languages  or  the  set  of  Turing-recognizable  languages. 


EXAMPLE  17.10  Semideciding  by  Running  Off  the  Tape 

Let  L  =  b*a(aU  b)*.  So,  any  machine  that  accepts  L  must  look  for  at  least  one  a. 
We  can  build  M  to  semidecide  L : 

1.  Loop: 

Move  one  square  to  the  right.  If  the  character  under  the  read/write  head  is 
an  a.  hall  and  accept. 

In  our  macro  language,  M  is: 

a'bO 

>  R _ a  r  y 

Of  course,  for  L,  we  can  do  better  than  M.  M#  decides  L: 

1.  Loop: 

Move  one  square  to  the  right.  If  the  character  under  the  rcad/wrile  head  is 
an  a,  halt  and  accept.  If  it  is  J.  hall  and  reject. 
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In  our  macro  language,  M#  is: 

bO 

>  R  a  v 


i 

n 


As  we  will  prove  laler,  there  are  languages  that  are  in  SD  but  not  D  and  so  a  semi- 
deciding  Turing  machine  is  the  best  we  will  be  able  to  build  for  those  languages. 


17.2.2  Turing  Machines  Compute  Functions 

When  a  Turing  machine  halts,  there  is  a  value  on  its  tape.  When  we  build  deciding  and 
semideciding Turing  machines,  we  ignore  that  value.  But  we  don’t  have  to.  Instead,  we 
can  define  what  it  means  for  a  Turing  machine  to  compute  a  function.  We'll  begin  by 
defining  what  it  means  for  a  Turing  machine  to  compute  a  function  whose  domain  and 
range  are  sets  of  strings.  Then  we’ll  see  that,  by  using  appropriate  encodings  of  other 
data  types  and  of  multiple  input  values,  we  can  define  Turing  machines  to  compute  a 
wide  variety  of  functions. 

In  this  section,  we  consider  only  Turing  machines  that  always  halt.  In  Chapter  25  we 
will  expand  this  discussion  to  include  Turing  machines  that  sometimes  fail  to  halt. 

Let  M  be  a  Turing  machine  with  start  slate  s,  hailing  stale  /i,  and  input  alphabet  2. 
The  initial  configuration  of  M  will  be  (  v,  □  w),  where  w  e  2*. 

Define  M(w)  =  z  ill  ($,□  «’)  ulz).  In  other  words  M(w)  =  z  iff  M .  when 

started  on  a  string  min  2*.  halls  with  z  on  its  tape  and  its  read/write  head  is  just  to  the 
left  of  z. 

Let  2’  C  T  be  M’s  output  alphabet  (i.e.,  the  set  of  symbols  that  M  may  leave  on  its 
tape  when  it  halls). 

Now,  let  /be  any  function  that  maps  from  2*  to  2'*.  We  say  that  a  Turing  machine 
M  computes  a  function  /  iff.  for  all  we  2*: 

•  If  iv  is  an  input  on  which  /is  defined,  M{w)  =  f(w).  In  other  words.  M  halts  with 
J'(  w)  on  its  tape. 

•  Otherwise  M{w)  does  not  halt. 

A  function  /  is  recursive  or  computable  iff  there  is  a  Turing  machine  M  that  com¬ 
putes  it  and  that  always  halts. The  term  computable  more  clearly  describes  the  essence 
of  these  functions.  The  traditional  name  for  them,  however,  is  recursive.  We  will  see 
why  that  is  in  Chapter  25.  In  the  meantime,  we  will  use  the  term  computable ? 

7ln  sonic  other  treatments  of  this  subject,  a  function  /  is  computable  iff  there  is  some  Turing  machine  M 
(which  may  not  always  halt)  that  computes  it.  Specifically,  if  there  are  values  for  which  /is  undefined,  M  will 
fail  to  halt  on  those  values.  Wc  will  say  that  such  a  function  is  partially  computable  and  we  will  reserve  the 
term  compituiblc  for  that  subset  of  the  partially  computable  functions  that  can  be  computed  by  a  Turing 
machine  that  always  halls. 
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There  is  a  natural  correspondence  between  the  use  of  Turing  machines  to  compute 
functions  and  their  use  as  language  deciders,  A  language  is  decidable  iff  its  characteristic 
function  is  computable.  In  other  words,  a  language  L  is  decidable  ifl  there  exists  a  Turing 
machine  that  always  halts  and  that  outputs  True  if  its  input  is  in  1.  and  Fulse  otherwise. 

EXAMPLE17.il  Duplicating  a  String 

Let  duplicated)  =  tew,  where  w  is  a  string  that  does  not  contain  J. 

A  TUring  machine  to  compute  duplicate  can  be  built  easily  if  we  have  two 

subroutines: 

•  The  copy  machine  C,  which  will  perform  the  following  operation: 

UhUJJJJJ  — *  JtrJu'J 

C  will  work  by  moving  back  and  forth  on  the  tape,  copying  characters  one  at  a 

time: 

f  I 

>  R  v<-  -<J  JRjRj  Jr  JLjLj  a 
□ 
h 

We  define  C  this  way  because  the  copy  prticess  is  straightforward  if  there  is  a 

character  (we  use  J)  to  delimit  the  two  copies. 

•  The  S_  machine,  which  we  described  in  Section  17.1.5.  We  will  use  S_  to  shift 
the  second  copy  of  w  one  square  to  the  left. 

M,  defined  as  follows,  computes  duplicate'. 

>  CS _  Lj 

Now  suppose  that  we  want  to  compute  functions  on  values  other  than  strings.  All  we 
need  to  do  is  to  encode  those  values  as  strings. To  make  it  easy  to  describe  functions  on 
such  values,  define  a  family  of  functions,  vuluedu).  For  any  positive  integer  k , 
value k(n)  returns  the  nonnegative  integer  that  is  encoded,  base  k.  by  the  string  n. 
For  example.  va/ue2(101)  =  5  and  valued  101)  -  b5.  We  w  ill  say  that  a  Turing  ma¬ 
chine  M  computes  a  function  f  from  NM*  to  M  provided  that,  for  some  k, 
valued  M(n\*  n2:...nm))  =  fivaluedui),.  ..valued «»,)) 


Not  all  functions  with  straightforward  definitions  are  computable.  For  exam¬ 
ple,  the  busy  beaver  functions  described  in  Section  25.1 .4  measure  the  "produc¬ 
tivity"  of  Turing  machines  by  returning  the  maximum  amount  of  work 
(measured  in  steps  or  in  number  of  symbols  on  the  tape)  that  can  be  done  by  a 
Turing  machine  with  n  states.  The  busy  beaver  functions  are  not  computable. 
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EXAMPLE  17.12  The  Successor  Function 

Consider  the  successor  function  sticc(n)  =  n  4-  1.  On  input  M  should  out¬ 
put  J/i  -I-  1-1. 

We  will  represent  n  in  binary  without  leading  zeros.  So  neOU  1{0.1}*  and 
/(»)  =  mi.  where  vulue2(n i)  =  value2(n)  +  1. 

We  can  now  define  the  Turing  machine  M  to  compute  succ : 

1.  Scan  right  until  the  first  □.  Then  move  one  square  hack  left  so  that  the 
read/write  head  is  on  the  last  digit  of  n. 

2.  Loop: 

2.1.  If  the  digit  under  the  read/write  head  is  a  0,  write  a  1,  move  the 
read/wrile  head  left  to  the  first  blank,  and  halt. 

22.  If  the  digit  under  the  read/write  head  is  a  1,  we  need  to  carry.  So  write  a 
0.  move  one  square  to  the  left,  and  go  back  to  the  lop  of  the  loop. 

2.3.  If  the  digit  under  the  read/write  head  is  a  □.  we  have  carried  all  the 
way  to  the  left.  Write  a  1.  move  one  square  to  the  left,  and  halt. 

In  our  macro  language,  M  is: 


We  can  build  Turing  machines  to  compute  functions  of  two  or  more  arguments  by 
encoding  each  of  the  arguments  as  a  string  and  then  concatenating  them  together,  sep¬ 
arated  by  a  delimiter. 


EXAMPLE  17.13  Binary  Addition 

Consider  the  plus  function  defined  on  the  integers.  On  input  J.v;yH  M  should 
output  the  sum  of  x  and  y. 

We  will  represent  x  and  y  in  binary  without  leading  zeros.  So.  for  example,  we’ll 
encode  the  problem  5  +  8  as  the  input  string  101;  1000.  On  this  input,  M  should 
halt  with  1101  on  its  tape.  More  generally,  M  should  compute  f(n\,n2)  =  m, 
where  value2{m)  =  wi/i/<n(»i,)  +  va/!te2(M2). 

We  leave  the  design  of  M  as  an  exercise. 
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17.3  Adding  Multiple  Tapes  and  Nondeterminism 

We  have  started  with  a  very  simple  definition  of  a  Turing  machine.  In  this  section  we 
will  consider  two  important  extensions  to  that  basic  model.  Our  goal  in  describing  the 
extensions  is  to  make  Turing  machines  easier  to  program.  But  we  don’t  want  to  do  that 
if  it  forces  us  to  give  up  the  simple  model  that  we  carefully  chose  because  it  would  be 
easy  to  prove  things  about.  So  we  are  not  going  to  add  any  fundamental  power  to  the 
model.  For  each  of  the  extensions  we  consider,  we  will  prove  that,  given  a  Turing 
machine  M  that  exploits  the  extension,  there  exists  a  Turing  machine  M'  that  is  equiv¬ 
alent  to  M  and  that  does  not  exploit  the  new  feature.  Each  of  these  proofs  will  be  by 
construction,  from  M  to  M '.  This  will  enable  us  to  place  a  bound  on  any  change  in  time 
complexity  that  occurs  when  we  transform  M  to  M\ 

There  will  be  a  bottom  line  at  the  end  of  this  chapter. The  details  of  the  definition  of 
a  Turing  machine  don’t  matter  in  the  sense  that  they  don’t  affect  what  can  be  computed. 
In  fact,  there  is  a  large  family  of  other  computational  models  that  look  even  more  unlike 
the  basic  definition  than  our  extended  machines  do  but  that  are  still  equivalent  in 
power.  We  will  articulate  that  principle  in  the  following  chapter.  We  will  see,  however, 
that  the  details  may  matter  if  we  are  concerned  about  the  efficiency  of  the  computations 
that  we  do.  Even  here,  though,  the  details  matter  less  than  one  might  initially  think.  With 
one  exception  (the  addition  of  nondeterminism),  we’ll  see  that  adding  features  changes 
the  time  complexity  of  the  resulting  programs  by  at  most  a  polynomial  factor. 


17.3.1  Multiple  Tapes 

The  first  extension  that  we  will  propose  is  additional  tapes.  Suppose  we  could  build  a 
Turing  machine  with  two  or  three  or  more  tapes,  each  with  its  own  read/wrile  head,  as 
shown  in  Figure  17.2.  What  could  we  do  with  such  a  machine?  One  answer  is, “a  lot  less 
going  back  and  forth  on  the  tape. 

A  k- tape  Turing  machine,  just  like  a  1-tape  Turing  machine,  is  a  sixtuple 
M  =  (/C,  k.T,  8,  s,  H).  A  configuration  of  a  /c-tape  machine  M  is  a  k  +  1  tuple: 
{state,  tape tape k),  where  each  tape  description  is  identical  to  the  description 
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FIGURE  17.2  A  multiple  tape  Turing  machine. 
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we  gave  in  Section  17.1.4  for  a  1-tape  machine.  M’s  initial  configuration  will  be 
(if,  \Jw.  In  other  words,  its  input  will  be  on  tape  1;  all  other  tapes  will  initial¬ 

ly  be  blank,  with  their  read/write  heads  positioned  on  some  blank  square.  If  M  halts,  we 
will  define  its  output  to  be  the  contents  of  tape  1;  the  contents  of  the  other  tapes  will  be 
ignored. 

At  each  step,  M  will  examine  the  square  under  each  of  its  read/write  heads.  The  set 
of  values  so  obtained,  along  with  the  current  state,  determines  M’s  next  action.  It  will 
write  and  then  move  on  each  of  the  tapes  simultaneously.  Sometimes  M  will  want  to 
move  along  one  or  more  of  its  tapes  without  moving  on  others.  So  we  will  now  allow 
the  move  action,  stay  put,  which  we  will  write  as  T .  So  5  is  a  function  from: 


((*  -  H)x  r, 

x  r2 
x  ... 
x  ... 

x  r*) 


to  (Kxryx  ,T} 

xr2x  t} 

X  ... 

X  ... 

xr*x  {«-,—,T}). 


EXAMPLE  17.14  Exploiting  Two  Tapes  to  Duplicate  a  String 

Suppose  that  we  want  to  build  a  TUring  machine  that,  on  input  □«;□,  outputs 
□tou;CI.  In  Example  17.11  we  saw  how  we  could  do  this  with  a  conventional,  one- 
tape  machine  that  went  back  and  forth  copying  each  character  of  w  one  at  a  time. 
To  copy  a  string  of  length  n  took  n  passes,  each  of  which  took  n  steps,  for  a  total  of 
n 2  steps.  But  to  make  that  process  straightforward,  we  left  a  blank  between  the  two 
copies.  So  then  we  had  to  do  a  second  pass  in  which  we  shifted  the  copy  one  square 
to  the  left.  That  took  an  additional  n  steps.  So  the  entire  process  took  0(n2)  steps. 
We  now  show  how  to  do  the  same  thing  with  a  two  tape  machine  Mc  in  O(n)  steps. 

Let  w  be  the  string  to  be  copied.  Initially,  w  is  on  tape  1  with  the  read/write 
head  just  to  its  left. The  second  tape  is  empty.  The  operation  of  Mc  is  shown  in  the 
following  series  of  snapshots: 
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The  first  thing  M c  will  do  is  to  move  to  the  right  on  both  tapes,  one  square  at  a 
time,  copying  the  character  from  tape  1  onto  the  corresponding  square  of  tape  2. 
This  phase  of  the  processing  takes  M  steps.  At  the  end  of  this  phase,  the  tapes 
will  look  like  this,  with  both  read/write  heads  on  the  blank  just  to  the  right  of  w : 
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EXAMPLE  17.14  ( Continued ) 
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Next  Mc  moves  tape  2's  read/write  head  all  the  way  back  to  the  left.  This  phase 
also  takes  |w>|  steps.  At  the  end  of  it,  the  tapes  will  look  like  this: 
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In  its  final  phase.  Mc  will  sweep  to  the  right,  copying  w  from  tape  2  to  tape  1. 
This  phase  also  takes  M  steps.  At  the  end  of  it,  the  tapes  will  look  like  this: 
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Me  takes  3*  |iu|  =  C?(M)  steps. 


EXAMPLE  17.15  Exploiting  Two  Tapes  for  Addition 

Exercise  17.3(a)  asks  you  to  construct  a  standard  one-tape  Turing  machine  to  add 
two  binary  numbers.  Let’s  now  build  a  2-tape  Turing  machine  MA  to  do  that.  Let  x 
and  y  be  arbitrary  binary  strings.  On  input  □x;  y,  MA  should  output  □*,  where  z  is 
the  binary  encoding  of  the  sum  of  the  numbers  represented  by  x  and  y. 

For  example,  let  x  =  5  and  y  =  6.  The  initial  configuration  of  tape  1  will  be 
□  101;  110.  The  second  tape  is  empty: 
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In  its  first  phase,  MA  moves  the  read/write  head  of  tape  1  all  the  way  to  the 
right,  copying  x  onto  tape  2  and  replacing  it,  on  tape  1 ,  with  Us.  It  also  replaces  the  ; 
with  a  O.  It  then  moves  both  read/write  heads  rightward  to  the  last  nonblank 
square.  At  the  end  of  this  phase,  y  is  on  tape  1;jc  is  on  tape  2,  and  each  read/write 
head  is  pointing  to  the  low-order  digit  of  its  number: 
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In  its  second  phase,  MA  moves  back  to  the  left,  considering  one  pair  of  digits  at 
a  time.  It  sums  them,  treating  a  □  on  either  tape  as  a  0.  records  the  result  on  tape 
1,  and  remembers  the  carry  digit  for  the  next  sum.  Once  it  has  encountered  a 
blank  on  both  tapes,  it  writes  the  carry  digit  if  necessary  and  then  it  halts.  At  that 
point,  its  tapes  are: 
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THEOREM  17.1  Equivalence  of  Multitape  and  Single-Tape  Turing  Machines 

Theorem:  Let  M  ( K ,  2,  I\  S.  s.  H)  be  a  k- tape  Turing  machine,  for  some  k  >  1. 
Then  there  is  a  standard  Turing  machine  M'  =  (K',  2',  T,  S',  s',  H')  such  that 
1  C I  ,  and  each  of  the  following  conditions  holds: 


•  For  any  input  string*,  M  on  input  *  halts  with  output  z  on  the  first  tape  iff  M ' 
on  input  *  halts  at  the  same  halting  state  (y,  n,  or  h)  and  with  z  on  its  tape. 

•  If,  on  input  x,  M  halts  after  n  steps,  then  M'  halts  in  Ofn1}  steps. 
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Proof:  The  proof  is  by  construction.  The  idea  behind  the  construction  is  that  M' 
will  simulate  A/' s  k  tapes  by  treating  its  single  tape  as  though  it  were  divided  into 
tracks.  Suppose  M  hus  k  tapes.  Then  an  ordered  A-luplc  of  values  describes  the 
contents  of  each  of  the  tapes  at  some  particular  location.  We  also  need  to  record 
the  position  of  each  of  the  k  read/wrile  heads.  We  do  this  by  assigning  two  tracks 
to  each  of  A/'s  tapes.  The  first  track  contains  the  value  on  the  corresponding 
square  of  the  tape.  The  second  track  contains  a  1  if  the  read/write  head  is  over 
that  square  and  a  0  otherwise.  Because  all  of  A/'s  tapes  are  infinite,  we  need  a  way 
to  line  them  up  in  order  to  be  able  to  represent  a  slice  through  them.  We  will  do 
this  by  starting  with  A/'s  initial  configuration  and  then  lining  up  the  tapes  so  that 
all  the  rcad/write  heads  form  a  single  column. 

To  see  how  this  works,  let  k  =  2.  Then  A/'s  initial  configuration  is  shown  in 
Figure  17.3(a).  A/’  will  encode  that  pair  of  tapes  on  its  single  tape  as  shown  in 
Figure  17.3(b). 

The  tape  for  M\  like  every  Turing  machine  tape,  will  contain  Js  on  all  but 
some  finite  number  of  squares,  initially  equal  to  the  length  of  the  input  string  w. 
But.  if  any  of  the  read/write  heads  of  M  moves  either  left  or  right  into  the  blank 
area.  A/'  will  pause  and  encode  the  next  square  on  its  tape  into  tracks. 

Like  all  standard  Turing  machines,  when  M'  starts,  its  tape  will  contain  its 
input.  The  first  thing  it  will  do  is  to  reformat  its  tape  so  that  it  is  encoded  as  k 
tracks,  as  shown  above.  It  will  then  compute  with  the  reformatted  tape  until  it 
halls.  Its  final  step  will  be  to  reformat  the  tape  again  so  that  its  result  (the  string 
that  is  written  on  its  simulated  tape  1)  is  written,  without  the  track  encoding,  on 
the  tape.  So  A/'  will  need  a  tape  alphabet  that  can  encode  both  the  initial  and 
final  situations  (a  single  character  per  tape  square)  and  the  encoding  of  k  tapes 
(with  k  values  plus  k  read/writc  head  bits  per  tape  square).  So  A/'  needs  a  tape 
alphabet  that  has  a  unique  symbol  for  each  element  of  I’  U  (F  x  {0, 1})*.  Thus 
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FIGURE  17.3  Encoding  multiple  tapes  as  multiple  tracks. 
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|r|  =  |rl  +  (2*  inn.  For  example,  to  do  the  encoding  shown  above  requires 
that  T*  contain  symbols  for  □,  a,  b,  (□,  1,  □,  1),  (a,  0,  □,  0),  (b,  0,  □,  0),  and  so 
forth.  |r|  =  3  +  62  =  39. 

M'  operates  as  follows: 

1.  Set  up  the  multilrack  tape: 

1.1.  Move  one  square  to  the  right  to  the  first  nonblank  character  on  the  tape. 

1.2.  While  the  read/write  head  is  positioned  over  some  non-G  character  c  do: 

Write  onto  the  square  the  symbol  that  corresponds  to  a  c  on  tape  1 
and  Us  on  every  other  track.  On  the  first  square,  use  the  encoding 
that  places  a  1  on  each  even-numbered  track  (corresponding  to  the 
simulated  read/write  heads).  On  every  other  square,  use  the  encoding 
that  places  a  0  on  each  even-numbered  track. 

2.  Simulate  the  computation  of  M  until  (if)  M  would  halt:  (Each  step  will  start  with 
the  read/ write  head  for  M'  on  the  Q  immediately  to  the  right  of  the  divided 
tape.) 

2.1.  Scan  left  and  store  in  the  state  the  k-tuple  of  characters  under  the  simu¬ 
lated  read/write  heads.  Move  back  to  the  G  immediately  to  the  right  of 
the  divided  tape. 

2.2.  Scan  left  and  update  each  track  as  required  by  the  appropriate  transition 
of  M.  If  necessary,  subdivide  a  new  square  into  tracks. 

2.3.  Move  back  right. 

3.  When  M  would  halt,  reformat  the  tape  to  throw  away  all  but  track  1,  position 
the  rcad/write  head  correctly,  and  then  go  to  M’s  halting  state. 

The  construction  that  we  just  presented  proves  that  any  computation  that  can 
be  performed  by  a  fc-tapeThring  machine  can  be  performed  by  a  1-tape  machine. 
So  adding  any  finite  number  of  tapes  adds  no  power  to  the  Turing  machine 
model.  But  there  is  a  difference:  The  1-tape  machine  must  execute  multiple  steps 
for  each  single  step  taken  by  the  fc-tape  machine.  How  many  more?  This  question 
is  only  well  defined  if  M  (and  so  M ')  halts.  So.  if  M  halts,  let: 

•  w  be  the  input  string  to  M ,  and 

•  n  be  the  number  of  steps  M  executes  before  it  halts. 

Each  lime  M'  executes  step  2,  it  must  make  two  passes  over  the  nonblank  seg¬ 
ment  oT  its  tape.  How  long  is  that  segment?  It  starts  out  with  length  \w\  but  if  M 
ever  moves  off  its  input  then  M'  will  extend  the  encoded  area  and  have  to  sweep 
over  the  new  section  on  each  succeeding  pass.  So  we  do  not  know  exactly  the  length 
of  the  nonblank  (encoded)  part  of  the  M'  tape,  but  we  can  put  an  upper  bound  on 
it  by  observing  that  JVf  (and  thus  M')  can  write  on  at  most  one  additional  square  at 
each  step.  So  an  upper  bound  on  the  length  of  encoded  tape  is  |w;|  +  n. 

We  can  now  compute  an  upper  bound  on  the  number  of  steps  it  will  take  M '  to 
simulate  the  execution  of  M  on  w: 
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Step  1  (initialization):  = 

Step  2  (computation): 

Number  of  passes  = 
Steps  at  each  pass: 
For  step  2. 1  = 

For  step  2.2  = 

Total  = 

Step  3  (clean  up):  — 

Total:  = 


0(l»r|). 


n. 


2 ‘(length  of  tape). 

2-(|»H  +  »). 
2*(1«*|  +  n). 
0(n-(  |irl  +  «)). 
CM  length  of  tape). 
C(n  •  (lii’l  +  n)). 


If  n  2  |u'|  (which  it  will  be  most  of  the  time,  including  in  all  cases  in  which  M 
looks  at  each  square  of  its  input  at  least  once),  then  the  total  number  of  steps 
executed  by  Af  is  0{n2). 


17.3.2  Nondeterministic  Turing  Machines 

So  far.  all  of  our  Turing  machines  have  been  deterministic.  What  happens  if  we  relax 
that  restriction?  Before  we  answer  that  question,  let’s  review  what  we  know  so  far 
about  nondeterminism: 

•  With  FSMs,  we  saw  that  nondeterminism  is  a  very  useful  programming  tool.  It 
makes  the  task  of  designing  certain  classes  of  machines,  including  pattern  matchers, 
easy.  So  it  reduces  the  likelihood  of  programmer  error.  But  nondeterminism  adds 
no  real  power.  For  any  NDFSM  M.  there  exists  an  equivalent  deterministic  one  M'. 
Furthermore,  although  the  number  of  states  in  XT  may  be  as  many  as  2*.  where  K 
is  the  number  of  states  in  M,  the  time  it  takes  to  execute  M'  on  some  input  string  w 
is  0(1  w I),  just  as  it  is  for  M. 

•  With  PDAs,  on  the  other  hand,  we  saw  that  nondeterminism  adds  power. There  are 
context-free  languages  that  can  be  recognized  by  a  nondeterministic  PDA  for 
which  no  equivalent  deterministic  PDA  exists. 

So.  now.  what  about  Turing  machines?  The  answer  here  is  mixed: 

•  Nondeterminism  adds  no  power  in  the  sense  that  any  computation  that  can  be  per¬ 
formed  by  a  nondeterministic  Turing  machine  can  be  performed  by  a  correspon¬ 
ding  deterministic  one. 

•  But  complexity  is  an  issue.  It  may  take  exponentially  more  steps  to  solve  a  problem 
using  a  deterministic  Turing  machine  than  it  does  to  solve  the  same  problem  with  a 
nondeterministic  Turing  machine. 

A  nondeterministic  Turing  machine  is  a  sixtuple  (K.  1. 1\  A.  v.  //).  where  K, 
2,  r,J,  and  H  are  as  for  standard  Turing  machines,  and  A  is  a  subset  of 
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(( K  -  H)  x  T)  x  {K  x  T  x  {  «— ,  —*  }).  In  other  words,  we  have  replaced  the  tran¬ 
sition  function  8  by  the  transition  relation  A,  in  much  the  same  way  we  did  when  we  de¬ 
fined  nondcterministic  FSMs  and  PDAs.  The  primary  difference  between  our 
definition  of  nondeterminism  for  FSMs  and  PDAs  and  our  definition  of  nondetermin- 
ism  for  Turing  machines  is  that,  since  the  operation  of  a  Hiring  machine  is  not  tied  to 
the  read-only,  one-at-a-time  consumption  of  its  input  characters,  the  notion  of  an 
e-transition  no  longer  makes  sense. 

But,  just  as  before,  we  now  allow  multiple  competing  moves  from  a  single  configura¬ 
tion.  And,  as  before,  the  easiest  way  to  envision  the  operation  of  a  nondeterministic 
Turing  machine  M  is  as  a  tree,  as  shown  in  Figure  17.4.  Each  node  in  the  tree  corre¬ 
sponds  to  a  configuration  of  M  and  each  path  from  the  root  corresponds  to  one  sequence 
of  configurations  that  M  might  enter. 

Just  as  with  PDAs,  both  the  state  and  the  data  (in  this  case  the  tape)  can  be  different 
along  different  paths. 

Next  we  must  define  what  it  means  for  a  nondeterministic  Turing  machine  to: 

•  Decide  a  language. 

•  Semidecide  a  language. 

•  Compute  a  function. 

We  will  consider  each  of  these  in  turn. 

Nondeterministic  Deciding 

What  does  it  mean  for  a  nondeterministic  Turing  machine  to  decide  a  language?  What 
happens  if  the  various  paths  disagree?  The  definition  we  will  use  is  analogous  to  the 
one  we  used  for  both  FSMs  and  PDAs.  Recall  that  a  computation  of  M  is  a  sequence  of 
configurations,  starling  in  an  initial  configuration  and  ending  in  a  halting  configuration. 

Let  M  =  (K,  2,  T,  A.  s,  H)  be  a  nondeterministic  Hiring  machine.  Let  w  be  an  ele¬ 
ment  of  2*.  Then  we  will  say  that: 

•  M  accepts  «>  iff  at  least  one  of  its  computations  accepts. 

•  M  rejects  w  iff  all  of  its  computations  reject. 


5.  Qabab 


9i-  rja^ab  q>  Qbbab 

FIGURE  17.4  Viewing  nondeierminism  as  search  through  a  space  of  computa¬ 
tion  paths. 
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M  decides  a  language  L  C  iff.  vtr  e 

•  There  is  a  finite  number  of  paths  that  M  can  follow  on  input  ir. 

•  All  of  those  paths  are  computations  (i.e.,  they  halt ).  and 

•  ir  e  L  iff  M  accepts  tv. 


EXAMPLE  17.16  Exploiting  Nondeterminism  For  Finding  Factors 

Let  COMPOSITES  ~  fine  {0.  1}*:  w  is  the  binary  encoding  of  a  composite 
number}.  We  can  build  a  nondeterministic  Turing  machine  M  to  decide  COM¬ 
POSITES.  M  operates  as  follows  on  input  w: 

1.  Nondeterminislically  choose  two  binary  numbers  p  and  q,  both 
greater  than  1,  such  that  |/?|  and  |r/|  s.  |«*|.  Write  them  on  the  tape, 
after  «?,  separated  by  For  example,  consider  the  input  siring 
110011.  After  this  step,  M's  tape,  along  one  of  its  paths,  will  look  like: 

□110011;  111;  1111JJ 

2.  Multiply  /)  and  q  and  pul  the  answer.  A.  on  the  tape,  in  place  of/?  and 
q.  At  this  point,  M's  tape  will  look  like: 

□110011;  1101001 JJ 

3.  Compare  A  and  w.  If  they  are  equal,  accept  (i.e.,  go  to y); else  reject 
(i.e.,  go  to  /r). 


Nondeterministic  Semideciding 

Next  we  must  decide  what  it  means  for  a  nondeterministic  Turing  machine  to  semide- 
cide  a  language.  What  happens  if  the  various  paths  disagree?  In  particular,  what  hap¬ 
pens  if  some  paths  halt  and  others  don't.  Again,  the  definition  that  we  will  use  requires 
only  that  there  exist  at  least  one  accepting  path.  We  don't  care  how  many  nonaccepling 
(looping  or  rejecting)  paths  there  are.  So  we  will  say: 

A  nondeterministic  Turing  machine  M  =  {K.  L .  1'.  A.  v.  II )  semidecides  a  language 
LCZ*  iff. Vice  2*: 

•  ir  e  L  iff  ( v,  J  ir)  yields  at  least  one  accepting  configuration.  In  other  words,  there 
exists  at  least  one  path  that  halts  and  accepts  »r. 

In  the  next  example,  as  well  as  many  others  to  follow,  we  will  consider  Turing  ma¬ 
chines  whose  inputs  are  strings  that  represent  descriptions  of  luring  machines.  We 
will  describe  later  exactly  how  we  can  encode  a  luring  machine  as  a  string.  For  now. 
imagine  it  simply  as  a  program  written  out  as  we  have  been  doing.  We  will  use  the 
notation  <M>  to  mean  the  siring  that  describes  some  luring  machine  M  (as  op¬ 
posed  to  the  abstract  machine  M.  which  wc  might  actually  encode  in  a  variety  of  dif¬ 
ferent  ways). 
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EXAMPLE  17.17  Semideciding  by  Simulation 

Let  L  =  {<M>  :  M  is  a  Turing  machine  that  halts  on  at  least  one  string}.  We  will 
describe  later  how  one  Turing  machine  can  simulate  another.  Assuming  that  we 
can  in  fact  do  that,  a  Turing  machine  S  to  semidecide  L  will  work  as  follows  on 
input  <M> : 

1.  Nondeterminislically  choose  a  string  iv  in  2*  and  write  it  on  the  tape. 

2.  Run  M  on  w. 

3.  Accept. 

Any  individual  branch  of  S  will  halt  iff  M  halts  on  that  branch’s  string.  If  a 
branch  halts,  it  accepts.  So  at  least  one  branch  of  S  will  halt  and  accept  iff  there  is 
at  least  one  string  on  which  M  halts. 

As  we  will  see  in  Chapter  21,  semideciding  is  the  best  we  are  going  to  be  able 
to  do  for  L.  We  will  also  see  that  the  approach  that  we  have  taken  to  designing  S, 
namely  to  simulate  some  other  machine  and  see  whether  it  halts,  will  be  one  that 
we  will  use  a  lot  when  semideciding  is  the  best  that  we  can  do. 


Nondeterministic  Function  Computation 

What  about  Turing  machines  that  compute  functions?  Suppose,  for  example,  that  there 
are  two  paths  through  some  Turing  machine  M  on  input  tv  and  they  each  return  a  dif¬ 
ferent  value.  What  value  should  M  return?  The  first  one  it  finds?  Some  sort  of  average 
of  the  two?  Neither  of  these  definitions  seems  to  capture  what  we  mean  by  a  computa¬ 
tion.  And  what  if  one  path  halts  and  the  other  doesn't?  Should  we  say  that  M  halts  and 
returns  a  value?  We  choose  a  strict  definition: 

A  nondeterministic  Tbring  machine  M  —  ( K ,  2,  T,  A,  jr,  H )  computes  a  function  / 
iff.Vw'e  2*: 

•  All  paths  that  M  can  follow  on  input  w  halt  (i.e.,  all  paths  are  computations),  and 

•  All  of  M' s  computations  result  in  J\tv). 

Does  Nondeterminism  Add  Power? 

One  of  the  most  important  results  that  we  will  prove  about  Turing  machines  is  that 
nondelerminism  adds  no  power  to  the  original  model.  Nondeterministic  machines  may 
be  easier  to  design  and  they  may  run  substantially  faster,  but  there  is  nothing  that  they 
can  do  that  cannot  be  done  with  some  equivalent  deterministic  machine. 

THEOREM  17.2  Nondeterminism  in  Deciding  and  Semideciding 
Turing  Machines 

Theorem:  If  a  nondeterministic  Turing  machine  M  =  (K.  2,  V.  A,  5.  H)  decides  a 
language  L.  then  there  exists  a  deterministic  Turing  machine  M'  that  decides  L.If 
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a  nondeterministic  Turing  machine  M  semi  decides  a  language  /..then  there  exists 
a  deterministic  Turing  machine  M'  that  semidecides  L. 

Proof  Strategy:  The  proof  will  be  by  construction.Thc  first  idea  we  consider  is  the 
one  we  used  to  show  that  nondeierminism  does  not  add  power  to  FSMs.  There 
we  showed  how  to  construct  a  new  FSM  M'  that  simulated  the  parallel  execution 
of  all  of  the  paths  of  the  original  FSM  M.  Since  M  had  a  finite  number  of  states, 
the  number  of  sets  of  states  that  A/'  could  be  in  was  finite.  So  we  simply  con¬ 
structed  A/'  so  that  its  stales  corresponded  to  sets  of  states  from  M.  But  that  sim¬ 
ple  technique  will  not  work  for  Turing  machines  because  we  must  now  consider 
the  tape.  Each  path  will  need  its  own  copy  of  the  tape.  Perhaps  we  could  solve 
that  problem  by  exploiting  the  technique  from  Section  17.3.1.  where  we  used  a 
single  tape  to  encode  multiple  tapes.  But  that  technique  depended  on  advance 
knowledge  of  k.  the  number  of  tapes  to  he  encoded.  Since  each  path  of  M'  will 
need  a  new  copy  of  the  tape,  it  isn't  possible  to  put  an  u  priori  bound  on  k.  So  we 
must  reject  this  idea. 

A  second  idea  we  might  consider  is  simple  depth-first  search.  If  any  path  rejects, 
M‘  will  back  up  and  try  an  alternative.  If  any  path  accepts,  Af ’  will  hall  and  accept. 

If  M'  explores  the  entire  tree  and  all  paihs  have  rejected,  then  it  rejects.  But  there 
is  a  big  problem  with  this  approach.  What  if  one  of  the  early  paths  is  one  that 
doesn't  halt?  Then  M'  will  get  stuck  and  never  find  some  accepting  path  later  in 
the  tree.  If  we  are  concerned  only  with  finding  deterministic  equivalents  for  non- 
deterministic  deciding  Turing  machines,  this  is  not  an  issue  since  all  paths  of  any 
deciding  machine  must  halt.  But  we  must  also  show  that  every  nondeterministic 
semideciding  Turing  machine  has  an  equivalent  deterministic  machine.  So  we 
must  abandon  the  idea  of  a  depth-first  search. 

But  we  can  build  an  A/'  that  conducts  a  breadth-first  search  of  the  tree  of  com¬ 
putational  paths  that  M  generates.  Suppose  that  there  are  never  more  than  b 
competing  moves  available  from  any  configuration  of  XI.  And  suppose  that  It  is 
the  length  of  the  longest  path  that  M  might  have  to  follow  before  it  can  accept. 
Then  M'  may  require  0(hh  * 1 )  moves  to  find  a  solution  since  it  may  have  to  explore 
an  entire  tree  of  height  li.  Is  an  exponential  increase  in  the  time  it  takes  a  deter¬ 
ministic  machine  to  simulate  the  computation  of  a  nondeterministic  one  the  best 
we  can  do?  No  one  knows.  Most  people  will  bet  yes.  Yet  no  one  has  been  able  to 
prove  that  no  belter  approach  exists.  A  proof  of  the  correctness  of  either  a  yes  or 
a  no  answer  to  this  question  is  worth  $1.1  >00,000  Q.  We  will  return  to  this  ques¬ 
tion  in  Part  V.  There  we  will  see  that  the  standard  way  in  which  this  question  is 
asked  is,  “Does  P  =  NP?“ 

For  now  though  we  will  continue  with  the  search-based  approach. To  complete 
this  proof  with  such  a  construction  requires  that  we  show  how  to  implement  the 
search  process  on  a  Turing  machine.  Because  breadth-first  search  requires  sub¬ 
stantial  bookkeeping  that  is  difficult  to  describe,  we'll  use  an  alternative  but  com¬ 
putationally  similar  technique,  iterative  deepening.  We  describe  the  construction 
in  detail  in  E.l. 


17.4  Simulating  a  “Real"  Computer  393 


THEOREM  17.3  Nondeterminism  in  Turing  Machines  That  Compute  Functions 

Theorem:  If  a  nondeterministic  Hiring  machine  M  =  ( K ,  2,  T,  A,  s,  H )  computes 
a  function  /then  there  exists  a  deterministic  Hiring  machine  M'  that  computes/. 

Proof:  The  proof  is  by  construction.  It  is  very  similar  to  the  proof  of  Theorem  17.2 
and  is  left  as  an  exercise. 


17.4  Simulating  a  "Real"  Computer  * 

We've  now  seen  that  adding  multiple  tapes  does  not  increase  the  power  of  Turing  Ma¬ 
chines.  Neither  does  adding  nondeterminism.  What  about  adding  features  that  would 
make  a  Hiring  Machine  look  more  like  a  standard  computer?  Consider,  for  example,  a 
simple  computer  that  is  composed  of: 


•  An  unbounded  number  of  memory  cells  addressed  by  the  integers  starting  at  0.  These 
memory  cells  may  be  used  to  contain  both  program  instructions  and  data.  We’ll  en¬ 
code  both  in  binary.  Assume  no  limit  on  the  number  of  bits  that  are  stored  in  each  cell. 

•  An  instruction  set  composed  of  basic  operations  including  read  (R).  move  input 
pointer  right  or  left  (MIR,  MIL),  load  (L),  store  (ST),  add  (A),  subtract  (S),  jump 
(JUMP),  conditional  jump  (CJUMP),  and  halt  (H).  Here’s  a  simple  example  program: 


R  10 

MIR  10 

CIUMP  1001 

A  10111 

ST  10111 


/*  Read  2  bits  from  the  input  tape  and  put  them 
into  the  accumulator. 

/*  Move  the  input  pointer  two  bits  to  the  right. 

/*  If  the  value  in  the  accumulator  is  0,  jump  to 
location  1001. 

/*  Add  to  the  value  in  the  accumulator  the  value 
at  location  10111. 

/*  Store  the  result  back  in  location  10111. 


•  A  program  counter. 

•  An  address  register. 

•  An  accumulator  in  which  operations  are  performed. 

•  A  small  fixed  number  of  special  purpose  registers. 

•  An  input  file. 

•  An  output  file. 


Can  a  Turing  machine  simulate  the  operation  of  such  a  computer?  The  answer  is  yes. 


THEOREM  17.4  A  Real  Computer  Can  be  Simulated  by  a  Turing  Machine 

Theorem:  A  random-access,  stored  program  computer  can  be  simulated  by  a  Hir¬ 
ing  Machine.  If  the  computer  requires  n  steps  to  perform  some  operation,  the 
Hiring  Machine  simulation  will  require  <9(/i6)  steps. 
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Proof:  The  proof  is  by  construction  of  a  simulator  we'll  call  simeomputer . 

The  simulator  simeomputer  will  use  7  tapes: 

•  Tape  1  will  hold  the  computer's  memory.  It  will  be  organized  as  a  series  of 
(address,  value)  pairs, separated  by  the  delimiter  #.The  addresses  will  be  rep¬ 
resented  in  binary.  The  values  will  also  be  represented  in  binary. This  means 
that  we  need  a  binary  encoding  of  programs  such  as  the  addition  one  we  saw 
above.  We’ll  use  the  first  4  bits  of  any  instruction  word  for  the  operation  code. 
The  remainder  of  the  word  will  store  the  address.  So  tape  1  will  look  like  this: 

#0 ,  va  7 ue0#l ,  va  7 ue^lO ,  va  7  ue2#ll ,  va  7 ue}#100 ,  va  7  ue4# . . .  # 

With  an  appropriate  assignment  of  operations  to  binary  encodings,  our  ex¬ 
ample  program,  if  stored  starting  at  location  0.  would  look  like: 

#0,000110010#ll11111001#101001110011#ll, 001010111#. . . . 

Notice  that  we  must  explicitly  delimit  the  words  because  there  is  no  bound 
on  their  length.  Addresses  may  get  longer  as  the  simulated  program  uses  more 
words  of  its  memory.  Numeric  values  may  increase  as  old  values  are  added  to 
produce  new  ones. 

•  Tape  2  will  hold  the  program  counter,  which  is  just  an  index  into  the  memory 
stored  on  tape  I. 

•  Tape  3  will  hold  the  address  register. 

•  Tape  4  will  hold  the  accumulator. 

•  Tape  5  will  hold  the  operation  code  of  the  current  instruction. 

•  Tape  6  will  hold  the  input  file. 

•  Tape  7  will  hold  the  output  file,  which  will  initially  he  hlank. 

Like  all  other  multitape  Turing  machines,  simeomputer  will  begin  with  its  input 
on  tape  1  and  all  other  tapes  blank.  Simeomputer  requires  two  inputs,  the  pro¬ 
gram  to  be  simulated  and  the  input  on  which  the  simulation  is  to  he  run.  So  we 
will  encode  them  both  on  tape  1.  separated  by  a  special  character  that  we  will 
write  as  %. 

Wc  will  assume  that  the  program  is  stored  starting  in  memory  locution  0,  so  the 
program  counter  will  initially  need  lobe  initialized  to  0.  The  simulator  simeomputer 
operates  as  follows: 

simeomputer ( program )  = 

!*  Initialize. 

1.  Move  the  input  string  to  tape  6. 

2.  Initialize  the  program  counter  (tape  2)  to  t). 

/*  Execute  one  pass  through  this  loop  for  every  instruction  executed  by 
program. 
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3.  Loop: 

3.1.  Starting  at  the  left  of  the  nonblank  portion  of  tape  1,  scan  to  the  right 
looking  for  an  index  that  matches  the  contents  of  tape  2  (the  program 
counter). 

/*  Decode  the  current  instruction  and  increment  the  program  counter. 

3.2.  Copy  the  operation  code  to  tape  5. 

3.3.  Copy  the  address  to  tape  3. 

3.4.  Add  1  to  the  value  on  tape  2. 

I*  Retrieve  the  operand. 

3.5.  Starting  at  the  left  again,  scan  to  the  right  looking  for  the  address  that  is 
stored  on  tape  3. 

/*  Execute  the  instruction. 

3.6.  If  the  operation  is  Load,  copy  the  operand  to  tape  4  (the  accumulator). 

3.7.  If  the  operation  is  Add,  add  the  operand  to  the  value  on  tape  4. 

3.8.  If  the  operation  is  Jump,  copy  the  value  on  tape  3  to  tape  2  (the  program 
counter). 

3.9.  And  so  forth  for  the  other  operations. 

How  many  steps  must  simeomputer  execute  to  simulate  a  program  that  runs  in 
n  steps?  It  executes  the  outer  loop  of  step  3  n  times.  How  many  steps  are  required 
at  each  pass  through  the  loop?  Step  3.1  may  take  /  steps,  if  t is  the  length  of  tape  1. 
Step  3.2  takes  a  constant  number  of  steps.  Step  3.3  may  take  a  steps  if  a  is  the 
number  of  bits  required  to  store  the  longest  address  that  is  used  on  tape  1.  Step 
3.4  may  also  take  a  steps.  Step  3.5  again  may  have  to  scan  all  of  tape  1,  so  it  may 
take  t  steps.  The  number  of  steps  required  to  execute  the  instruction  varies: 

•  Addition  takes  v  steps  if  v  is  the  length  of  the  longer  operand. 

•  Load  takes  v  steps  if  v  is  the  length  of  the  value  to  be  loaded. 

•  Store  generally  takes  v  steps  if  v  is  the  length  of  the  value  to  be  stored.  How¬ 
ever,  suppose  that  the  value  to  be  stored  is  longer  than  the  value  that  is  al¬ 
ready  stored  at  that  location.  Then  simeomputer  must  shift  the  remainder  of 
Tape  1  one  square  to  the  right  in  order  to  have  room  for  the  new  value.  So  ex¬ 
ecuting  a  Store  instruction  could  take  r  steps  (where  t  is  the  length  of  tape  1). 

The  remainder  of  the  operations  can  be  analyzed  similarly.  Notice  that  we 
have  included  no  complex  operations  like  multiply.  (But  this  is  not  a  limitation. 
Multiply  can  be  implemented  as  a  sequence  of  additions.)  So  it  is  straightforward 
to  see  that  the  number  of  steps  required  to  perform  any  of  the  operations  that  we 
have  defined  is,  in  the  worst  case,  a  linear  function  of  f,  the  length  of  tape  1. 

So  how  long  is  tape  1?  It  starts  out  at  some  length  k.  Each  instruction  has  the 
ability  to  increase  the  number  of  memory  locations  by  1  since  a  store  instruction 
can  store  to  an  address  that  was  not  already  represented  on  the  tape.  And  each  in¬ 
struction  has  the  ability  to  increase  by  1  the  length  of  a  machine  “word”,  since  the 
add  instruction  can  create  a  value  that  is  one  bit  longer  than  either  of  its  operands. 
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So.  after  n  simulated  steps,  t,  the  length  of  the  tape,  could  be  k  +  n2  (if  new  words 
are  created  and  each  word  gets  longer).  If  wc  assume  that  n  s  k.  we  can  say  that 
the  length  of  the  tape,  after  n  steps,  is  O(tr).  So  the  number  of  steps  that 
simcomputer  must  execute  to  simulate  each  step  of  the  original  program  isO(n  )• 
Since  simcomputer  must  simulate  n  steps  of  the  original  program,  the  total  num¬ 
ber  of  steps  executed  by  simcomputer  is  0(nx). 

The  simulator  simcomputer  uses  7  tapes.  We  know,  from  Theorem  17.1,  that  a 
fc-tape  Turing  machine  that  executes  n  steps  can  be  simulated  in  0(/r)  steps  by  a 
one-tape,  standard  Turing  Machine.  So  the  total  number  of  steps  it  would  take  a 
one-tape  standard  Turing  Machine  to  simulate  one  of  our  programs  executing  n 
steps  is  0(nh).  While  this  represents  a  nontrivial  increase  in  the  number  of  stcps.it 
is  important  to  note  that  the  increase  is  a  polynomial  function  of  /».  It  does  not  grow 
exponentially,  the  way  the  simulation  of  a  nondeterministic  Turing  Machine  did. 

Any  program  that  can  be  written  in  any  modern  programming  language  can  be 
compiled  into  code  for  a  machine  such  as  the  simple  random  access  machine  that  we 
have  just  described.  Since  we  have  shown  that  any  such  machine  can  be  simulated  by  a 
Turing  machine,  we  will  begin  to  use  clear  pseudocode  to  define  Turing  machines. 


17.5  Alternative  Turing  Machine  Definitions  • 

We  have  provided  one  definition  for  what  a  Turing  machine  is  and  how  it  operates. 
There  are  many  equivalent  alternatives.  In  this  section  we  w  ill  explore  two  of  them. 


17.5.1  One-Way  vs.  Two-Way  Infinite  Tape 

Many  books  define  a  Turing  machine  to  have  a  tape  that  is  infinite  in  only  one  direc¬ 
tion.  We  use  a  two-way  infinite  tape.  Does  this  difference  matter?  In  other  words,  are 
there  any  problems  that  one  kind  of  machine  can  solve  that  the  other  one  cannot?  The 
answer  is  no. 


THEOREM  17.5  A  One-Way  Infinite  Tape  is  Equivalent  to  a  Two-Way 
Infinite  Tape 

Theorem:  Any  computation  by  a  Turing  machine  with  a  two-way  infinite  tape  can 
be  simulated  by  a  Turing  machine  with  a  one-way  infinite  tape. 

Proof:  Let  M  be  a  Turing  machine  with  a  two-way  infinite  tape.  We  describe  M\  an 
equivalent  machine  whose  tapes  are  infinite  in  only  one  direction.  M‘  will  use 
three  tapes.  The  first  will  hold  that  part  of  Afs  tape  that  starts  with  the  square 
under  the  read/write  head  and  goes  to  the  right.  The  second  M'  tape  will  hold 
that  part  of  Af’s  tape  to  the  left  of  the  read/write  head.  The  third  tape  will  count, 
in  unary,  the  number  of  moves  that  M  has  made  so  far.  An  example  of  this  encod¬ 
ing  is  shown  in  Figure  17.5. 
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The  two-way  tape: 


The  simulation: 
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FIGURE  17.5  Simulating  a  two-way  infinite  tape  on  a  one-way  infinite  tape. 


Af ’s  read/write  head  is  shown  above  as  a  dashed  arrow.  M '  has  three  read/write 
heads  (shown  as  dark  arrows  above),  one  for  each  tape.  It  will  use  its  finite  state 
controller  to  keep  track  of  whether  the  simulated  read/write  head  is  on  tape  1  or 
tape  2.  If  the  simulated  read/write  head  is  on  tape  1,  square  t,  then  the  M'  tape  1 
read/write  head  will  be  on  square  t  and  its  tape  2  read/write  head  will  be  on  the 
leftmost  square.  Similarly  if  the  simulated  read/write  head  is  on  tape  2. 

Initially.  M‘  tape  1  will  be  identical  to  M's  tape,  Af  tape  2  will  be  blank,  and 
the  M1  tape  3  will  also  be  blank  (since  no  moves  have  yet  been  made). 

The  simulation:  Af '  simulates  each  step  of  M.  If  M  attempts  to  move  to  the  left, 
off  the  end  of  its  tape,  M'  will  begin  writing  at  the  left  end  of  tape  2.  If  M  contin¬ 
ues  to  move  left,  Af '  will  move  right  on  tape  2.  If  M  moves  right  and  goes  back 
onto  its  original  tape,  AT  will  begin  moving  right  on  tape  1.  If  M  would  halt,  then 
Af'  halts  the  simulation. 

But,  if  Af'  is  computing  a  function,  then  Af'  must  also  make  sure,  when  it  halts, 
that  its  tape  1  contains  exactly  what  M's  tape  would  have  contained.  Some  of  that 
may  be  on  tape  2.  If  it  is,  then  the  contents  of  tape  1  must  be  shifted  to  the  right 
far  enough  to  allow  the  contents  of  tape  2  to  be  moved  up. The  maximum  number 
of  symbols  that  Af'  may  have  written  on  tape  2  is  n,  where  n  is  the  number  of 
steps  executed  by  Af.Tape  3  contains  n.  So  Af'  moves  n  squares  to  the  right  on 
tape  2.  Then  it  moves  leftward,  one  square  at  a  time  as  long  as  it  reads  only 
blanks.  Each  time  it  moves  to  the  left,  it  erases  a  1  from  tape  3.  When  it  hits  the 
first  nonblank  character,  tape  3  will  contain  the  unary  representation  of  the  num¬ 
ber  of  times  Af '  must  shift  tape  1  one  square  to  the  right  and  then  copy  one  sym¬ 
bol  from  tape  2  to  tape  1.  Af '  executes  this  shifting  process  the  required  number 
of  times  and  then  halts. 
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17.5.2  Stacks  vs.  a  Tape 

When  we  switched  from  working  with  PDAs  to  working  with  Turing  machines,  we  gave 
up  the  use  of  a  stack.  The  Turing  machine's  infinite  tape  has  given  us  more  power  than 
we  had  with  the  PDA’s  stack.  But  it  makes  sense  to  take  one  more  look  at  the  stack  as 
a  memory  device  and  to  ask  two  questions: 

•  Did  we  lose  anything  by  giving  up  the  PDA's  stack  in  favor  of  the  Turing  ma¬ 
chine’s  tape? 

•  Could  we  have  gotten  the  power  of  a  Turing  machine’s  tape  using  just  stacks? 

Simulating  a  Stack  by  a  Turing  Machine  Tape 


THEOREM  17.6  A  PDA  can  be  Simulated  by  a  Turing  Machine _ 

Theorem:  The  operation  of  any  PDA  P  can  he  simulated  by  some  Turing  machine  M. 

Proof:  The  proof  is  by  construction.  Given  some  PDA  P,  we  construct  a  (possibly) 
nondeterministic  Turing  machine  M  to  simulate  the  operation  of  P.  Since  there  is 
a  finite  number  of  states  in  P ,  M  can  keep  track  of  the  current  state  of  P  in  its  own 
finite  state  controller. 

Each  branch  of  M  will  use  two  tapes,  one  for  the  input  and  one  for  the  stack,  as 
shown  in  Figure  17.6.  Tape  1  will  function  just  like  the  read-only  stream  of  input 
that  is  fed  to  the  PDA.  M  will  never  write  on  tape  I  and  will  only  move  to  the 
right,  one  square  at  a  time.  Tape  2  will  mimic  the  behavior  of  A/  s  stack,  with  its 
read/wrile  head  moving  hack  and  forth  as  symbols  are  pushed  onto  and  popped 
from  the  stack. 

M  will  operate  as  follows: 

1.  Initialization:  Write  #,  indicating  the  bottom  of  the  slack,  under  the 
read/write  head  of  Tape  2.  Tape  2’s  read/wrile  head  will  always  remain 
positioned  on  the  lop  of  the  slack.  Set  the  simulated  slate  Ss jfn  to  s. 


Tape  l 
(Input) 

Tape  2 
Corresponding  to 


FIGURE  17.6  Simulating  a  PDA  by  a  Turing  machine. 
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2.  Simulation:  Let  the  character  under  the  read/write  head  of  Tape  1  be  c. 

At  each  step  of  the  operation  of  P  do: 

2.1.  If  c  =  □,  halt  and  accept  if  Ssim  is  an  accepting  state  of  P  and 
reject  otherwise. 

2.2.  Nondeterministically  choose  from  A  a  transition  of  the  form 
((Ssim,  c,  pop),  (q2,  push))  or  ((S^,  e,  pop),  (q2,  push)).  In  other 
words,  chose  some  transition  from  the  current  state  that  either 
reads  the  current  input  character  or  reads  e. 

2.3.  Scan  left  on  Tape  2  \pop\  squares,  blanking  out  each  square  and 
checking  to  see  whether  Tape  2  matches  pop.  If  it  does  not,  termi¬ 
nate  this  path.  If  it  does,  then  move  right  on  Tape  2  \push\ 
squares  copying  push  onto  Tape  2. 

2.4.  If  we  are  not  following  an  e-transition,  move  the  read/write  head 
of  Tape  1  one  square  to  the  right  and  set  c  to  the  character  on  that 
square. 

2.5.  Set  5sim  to  q2  and  repeat. 


So  we  gave  up  no  power  when  we  abandoned  the  PDA’s  stack  in  favor  of  the  Turing 
machine's  tape. 

Simulating  a  Turing  Machine  Tape  by  Using  Two  Stacks 

What  about  the  other  way  around?  Is  there  any  way  to  use  stacks  to  get  the  power  of 
an  infinite,  writcablc  tape?  The  answer  is  yes.  Any  "hiring  machine  M  can  be  simulated 
by  a  PDA  P  with  two  stacks.  Suppose  that  M's  tape  is  as  shown  in  Figure  17.7  (a). Then 
P's  two  stacks  will  be  as  shown  in  Figure  17.7  (b). 

Stack  1  contains  M's  active  tape  up  to  and  including  the  square  that  is  currently  under 
the  read/write  head.  Stack  2  contains  the  remainder  of  M's  active  tape.  If  M  moves  to  the 
left,  the  top  character  from  stack  1  is  popped  and  then  pushed  onto  stack  2.  If  M  moves 
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FIGURE  17.7  Simulating  a  "hiring  machine  tape  with  two  stacks. 
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onto  the  blank  region  to  the  left  of  its  tape,  then  the  character  that  it  writes  is  simply 
pushed  onto  the  top  of  slack  1.  If  M  moves  to  the  right,  the  lop  character  from  stack  2  is 
popped  and  then  pushed  onto  slack  1.  If  M  moves  onto  the  blank  region  to  the  right  of  its 
tape,  then  the  character  that  it  writes  is  simply  pushed  onto  the  top  of  stack  1. 

17.6  Encoding  Turing  Machines  as  Strings 

So  far.  all  of  our  Turing  machines  have  been  hardwired  (just  like  early  computers). 
Does  it  make  sense,  just  as  it  did  with  real  computers,  to  develop  a  programmable 
Turing  machine:  a  single  Turing  machine  that  accepts  as  input  a  (A/:  Turing  machine, 
s\  input  siring)  pair  and  outputs  whatever  M  would  output  when  started  up  on  s?  The 
answer  is  yes.  We  will  call  such  a  device  the  universal  Turing  machine  or  simply  U. 

To  define  U  we  need  to  do  two  things: 

1.  Define  an  encoding  scheme  that  can  be  used  to  describe  to  U  a  (TUring  machine, 
input  siring)  pair. 

2.  Describe  the  operation  of  U  given  such  an  encoded  pair. 

17.6.1  An  Encoding  Scheme  for  Turing  Machines 

We  need  to  be  able  to  describe  an  arbitrary  Turing  machine  M  -  [K.1.V,  5.  s ,  H)  as 
a  string  that  we  will  write  as  <M>.  When  wc  define  the  universal  Turing  machine,  we 
will  have  to  assign  it  a  fixed  input  alphabet.  But  the  machines  we  w  ish  to  input  to  it  may 
have  an  arbitrary  number  of  states  and  they  may  exploit  alphabets  of  arbitrary  size.  So 
we  need  to  find  a  way  to  encode  an  arbitrary  number  of  stales  and  a  tape  alphabet  of 

arbitrary  size  using  some  new  alphabet  of  fixed  size.  The  obvious  solution  is  to  encode 

both  stale  sets  and  alphabets  as  binary  strings. 

We  begin  with  K.  We  will  determine  /.  the  number  of  binary  digits  required  to  encode 
the  numbers  from  0  to  |  K  |  -  1.  Then  we  will  number  the  stales  from  Oto  |K|  -  land  as¬ 
sign  to  each  state  the  binary  string  of  length  i  that  corresponds  to  its  assigned  number.  By 
convention,  the  start  state  s  will  be  numbered  O.Thc  others  may  be  numbered  in  any  order. 
Let  /'  be  the  binary  siring  assigned  to  state  f.'ITten  we  assign  strings  to  slates  as  follows: 

■  If  /  is  the  halting  state  y.  assign  it  the  string  y 

•  If  i  is  the  halting  stale  n,  assign  it  the  string  nf '. 

•  If  i  is  any  other  slate,  assign  it  the  string  q/'. 

EXAMPLE  17.18  Encoding  the  States  of  a  Turing  Machine 

Suppose  that  we  are  encoding  a  Turing  machine  M  with  9  states.  Then  it  will  take 
four  binary  digits  to  encode  the  names  of  the  9  states.  The  start  state  s  will  be  en¬ 
coded  as  qOOOO.  Assuming  that  y  has  been  assigned  the  number  3  and  n  has  been 
assigned  the  number  4,  the  remaining  states  will  be  encoded  as  qOOOl,  qOOlO, 
yOOll,  nOlOO.  qOlOl,  qOHO,  qOlll,  and  qlOOO. 
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Next  we  will  encode  the  tape  alphabet  in  a  similar  fashion.  We  will  begin  by  deter¬ 
mining  j,  the  number  of  binary  digits  required  to  encode  the  numbers  from  0  to 
|r|  -  1.  Then  we  will  number  the  characters  (in  any  order)  from  0  to  |T|  —  1  and 
assign  to  each  character  the  binary  string  of  length  j  that  corresponds  to  its  assigned 
number.  Finally,  we  will  assign  to  each  symbol  y  the  string  ay',  where  y'  is  the  binary 
string  already  assigned  to  y. 


EXAMPLE  17.19  Encoding  the  Tape  Alphabet  of  a  Turing  Machine 

Suppose  that  we  are  encoding  a  TUring  machine  M  with  T  =  {□,  a,  b,  c}.  Then  it 
will  take  two  binary  digits  to  encode  the  names  of  the  four  characters. The  assign¬ 
ment  of  numbers  to  the  characters  is  arbitrary.  It  just  must  be  done  consistently 
throughout  the  encoding.  So,  for  example,  we  could  let: 

□  =  aOO 

a  =  aOl 

b  =  alO 

c  =  all 


Next  we  need  a  way  to  encode  the  transitions  of  5,  each  of  which  is  a  5-tuple: 
(state,  input  character,  state,  output  character,  move).  We  have  just  described  how 
we  will  encode  states  and  tape  characters. There  are  only  two  allowable  moves,  —* 
and  *— ,  so  we  can  just  use  those  two  symbols  to  stand  for  their  respective  moves. 
We  will  encode  each  transition  in  5  as  a  string  of  exactly  the  form  (state,  character, 
state,  character,  move),  using  the  state,  character,  and  move  encodings  that  we  have 
just  described.  Then  we  can  specify  5  as  a  list  of  transitions  separated  by  commas. 

With  these  conventions,  we  can  completely  specify  almost  all  Turing  machines  sim¬ 
ply  as  a  list  of  transitions.  But  we  must  also  consider  the  special  case  of  the  simple  Tur¬ 
ing  machine  shown  in  Figure  17.8.  Mnam  has  no  transitions  but  it  is  a  legal 

Turing  machine  (that  semidecides  2*).To  enable  us  to  represent  machines  like  Mnone , 
we  add  one  more  convention:  When  encoding  a Thring  machine  M,  for  any  state  q  in  M 
that  has  no  incoming  transitions,  add  to  AT s  encoding  the  substring  (q).  So  M„one  would 
be  encoded  as  simply  (q0). 


FIGURE  17,8  Encoding  a  Tiring 
machine  with  no  transitions. 
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EXAMPLE  17.20  Encoding  a  Complete  Turing  Machine  Description 
Consider  Af  =  ({*,  </./»}. {«.  b,  c}.  {□,«.  b.  c}.  6.  s,  {/i}).  where  8  = 
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We  start  encoding  M  by  determining  encodings  for  each  of  its  states  and  tape 
symbols: 


state/symbol 
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all 

The  complete  encoding  of  M ,  which  we  will  denote  by  <M>.  is  then: 

(q00.a00.q01.a00,  -* ),  (q00.a01.q00.al0.  -» ),  (q00.al0.q01.a01,  «-  ). 
(q00,all,q01,al0,  *- ),  (q01.a00.q00.a01.  -*  ).  (q0La01,q01,al0.  — ), 
(q01.al0.q01.al0,  — ).  (q01.all.ql0.a01,  —  ). 


17.6.2  Enumerating  Turing  Machines 

Now  that  we  have  an  encoding  scheme  for  Turing  machines,  it  is  possible  to  create  an 
enumeration  of  them. 

THEOREM  17.7  We  can  Lexicographically  Enumerate  the  Valid 
Turing  Machines 

Theorem:  There  exists  an  infinite  lexicographic  enumeration  of: 
a.  All  syntactically  valid  Turing  machines. 
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b.  All  syniaclically  valid  Turing  machines  whose  input  alphabet  is  some  partic¬ 
ular  set  2. 

c.  All  syntactically  valid  Turing  machines  whose  input  alphabet  is  some  partic¬ 
ular  set  £  and  whose  tape  alphabet  is  some  particular  set  P. 

Proof:  Fix  an  alphabet  2  =  {(,),  a,  q,  y,  n.  0.  1,  comma,  i.e..  the  set  of 

characters  that  are  used  in  the  Turing  machine  encoding  scheme  that  we  just  de¬ 
scribed.  Lei  the  symbols  in  2  be  ordered  as  shown  in  the  list  we  just  gave. The  fol¬ 
lowing  procedure  lexicographically  enumerates  all  syntactically  valid  Turing 
machines: 

1.  Lexicographically  enumerate  the  strings  in  2*. 

2,  As  each  string  s  is  generated,  check  to  see  whether  it  is  a  syntactically  valid 
Turing  machine  description.  If  it  is,  output  it. 

To  enumerate  just  those  Turing  machines  whose  input  and/or  tape  alphabets 
arc  limited  to  the  symbols  in  some  particular  sets  2  and  T,  add,  to  step  2,  a  check 
to  sec  that  only  alphabets  of  the  appropriate  sizes  are  allowed. 


With  this  procedure  in  hand,  we  can  now  talk  about  the  /*h  Turing  machine.  It  is  the 
j*h  element  generated  by  the  enumeration  procedure. 


17.6.3  Another  Win  of  Encoding 

Our  motivation  for  defining  what  we  mean  by  <M>  was  that  we  would  like  to  be  able 
to  input  a  definition  of  M  to  the  universal  Turing  machine  U,  which  will  then  execute 
M.  But  it  turns  out  that,  now  that  we  have  a  well-defined  string  encoding  <M>  for  any 
luring  machine  M.  we  can  pass  <M>  as  input  to  programs  other  than  U  and  ask  those 
programs  to  operate  on  M.  So  we  can  talk  ahout  some  Turing  machine  Tthat  takes  the 
description  of  another  Turing  machine  (say  Mi)  as  input  and  transforms  it  into  a  de¬ 
scription  of  a  different  machine  (say  Mi)  that  performs  some  different,  but  possibly  re¬ 
lated  task.  We  show  this  schematically  in  Figure  17.9. 

We  will  make  extensive  use  of  this  idea  of  transforming  one  Turing  machine  into 
another  when  we  discuss  the  use  of  reduction  to  show  that  various  problems  are 
undecidable. 


*cM\> 


<Mi> 


FIGURE  17.9  Turing  machine  T  lakes  one  Turing  ma¬ 
chine  as  input  and  creates  another  as  its  output. 
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EXAMPLE  17.21  One  Turing  Machine  Operates  on  the  Description 
of  Another 

Define  a  Turing  machine  T  whose  specifications  are: 

Input:  <  A/l>,  where  is  a  Turing  machine  that  reads  its  input  tape  and  per¬ 
forms  some  operation  P  on  it. 


Output:  <M>>.  where  M2  is  a  Turing  machine  that  performs  P  on  an  empty 
input  tape. 

The  job  of  T  is  shown  in  the  following  diagram.  We  have,  for  convenience  here, 
described  <M2>  using  our  macro  language,  hut  we  could  have  written  out  the 
detailed  string  encoding  of  it. 


J  I 

>  K  x  *  □  J 


My 

T constructs  the  machine  M2  that  starts  by  erasing  its  input  tape.Then  it  passes 
control  to  Mv  So  we  can  define  T  as  follows: 

r(<A#i>)  = 

Output  the  machine  shown  on  the  right  above. 


<A/,> 


17.6.4  Encoding  Multiple  Inputs  to  a  Turing  Machine 

Every  Turing  machine  takes  a  single  string  as  its  input.  Sometimes,  however,  we  wish  to 
define  a  Turing  machine  that  operates  on  more  than  one  object.  For  example,  we  are 
going  to  define  the  universal  Turing  machine  U  to  accept  a  machine  A/  and  a  string  w 
and  to  simulate  the  execution  of  M  on  tr.To  do  this,  we  need  to  encode  both  arguments 
as  a  single  string.  We  can  easily  do  that  by  encoding  each  argument  separately  and  then 
concatenating  them  together,  separated  by  some  character  that  is  not  in  any  of  the  al¬ 
phabets  used  in  forming  the  individual  strings.  Tor  example,  we  could  encode  the  pair 
(<M>.  <aabb>)  as  <M>:  <aabb>.  We  will  use  the  notation  <.rl. ,v2. . . . *„>  to 
mean  a  single  string  that  encodes  the  sequence  of  indi\  idual  values  .v,.  x2, . . .  xn. 

17.7  The  Universal  Turing  Machine 

We  are  now  in  a  position  to  return  to  the  problem  of  building  a  universal  Turing  ma¬ 
chine,  which  we  ll  call  U.  U  is  not  truly  "universal"  in  the  sense  that  it  can  compute 
“everything."  As  we’ll  see  in  the  next  few  chapters,  there  are  things  that  cannot  be 
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computed  by  any  Turing  machine.  U  is,  however,  universal  in  the  sense  that,  given  an 
urbitrary  Turing  machine  M  and  an  input  w.  U  will  simulate  the  operation  of  M  on  w. 
We  can  state  U's  specification  as  follows:  On  input  <M ,  w>.  U  must: 

•  Hall  iff  M  halls  on  w. 

•  If  M  is  a  deciding  or  a  semideciding  machine,  then: 

•  If  M  accepts,  accept. 

•  If  M  rejects,  reject. 

•  If  M  computes  a  function,  then  L/(<  M ,  «>>)  must  equal  M(w). 

U  will  use  three  tapes  to  simulate  the  execution  of  M  on  xv : 

•  Tape  1  will  correspond  to  M’s  tape. 

•  Tape  2  will  contain  <M>,  the  “program”  that  U  is  running. 

•  Tape  3  will  contain  the  encoding  of  the  state  that  M  is  in  at  any  point  during  the 

simulation.  Think  of  tape  3  as  holding  the  program  counter. 

When  U  begins,  it  will  have  <M.  u»  on  tape  1.  (Like  all  multitape  machines,  it 
starts  with  its  input  on  tape  1  and  all  other  tapes  blank.)  Figure  17.10  (a)  illustrates  U's 
three  tapes  when  it  begins.  It  uses  the  multitrack  encoding  of  three  tapes  that  we  de¬ 
scribed  in  Section  17.3.1. 

U's  first  job  is  to  initialize  its  tapes.  To  do  so,  it  must  do  the  following: 

1.  Transfer  <M>  from  tape  1  to  tape  2  (erasing  it  from  tape  1). 

2.  Examine  <  M  >  to  determine  the  number  of  states  in  M  and  thus  r,  the  number  of 
binary  digits  required  to  encode  M's  stales.  Write  q0'  (corresponding  to  the  start 
slate  of  M)  on  tape  3. 

Assume  that  it  takes  three  bits  to  encode  the  states  of  M. Then,  after  initialization,  U's 
tapes  will  be  as  shown  in  Figure  17.10  (b). 

U  begins  simulating  M  with  the  read/write  heads  of  its  three  tapes  as  described 
above.  More  generally,  it  will  start  each  step  of  its  simulation  with  the  read/write  heads 
placed  as  follows: 

•  Tape  Vs  read/write  head  will  be  over  the  a  that  is  the  first  character  of  the  encoding 
of  the  current  character  on  M's  tape. 

•  Tape  2’s  read/write  head  will  be  at  the  beginning  of  <Af>. 

•  Tape  3’s  read/write  head  will  be  over  the  q  of  the  program  counter. 

Following  initialization  as  described  above,  U  operates  as  follows: 

1.  Until  M  would  halt  do: 

1.1.  Scan  tape  2  for  a  quintuple  that  matches  the  current  state,  input  pair. 
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FIGURE  17.10  The  tapes  of  the  universal  Turing  machine  U. 

1.2.  Perform  the  associated  action,  by  changing  tapes  I  and  3.  If  necessary, 
extend  the  simulated  tape  that  is  encoded  on  tape  1. 

13.  If  no  matching  quintuple  found,  halt. 

2.  Report  the  same  result  M  would  report: 

•  If  M  is  viewed  as  a  deciding  or  semideciding  machine  for  some  language  L:  If 
the  simulated  state  of  M  is  v,  then  accept.  If  the  simulated  state  is  n,  then  reject. 

•  If  M  is  viewed  as  a  machine  that  computes  a  function:  Reformat  the  tape  so 
that  the  value  of  tape  1  is  all  that  is  left. 

How  long  does  it  take  U  to  simulate  the  computation  of  M'l  If  M  would  halt  in  k 
steps,  then  U  must  go  through  its  loop  k  times.  Each  lime  through  the  loop,  it  must  scan 
<  M  >  to  find  out  what  to  do.  So  U  takes  0(  |  M  |  •  k )  steps. 

Now  we  know  that  if  we  wanted  to  build  real  Turing  machines  we  could  build  one 
physical  machine  and  feed  it  descriptions  of  any  other  Turing  machines  that  we  wanted 
to  run.  So  this  is  yet  another  way  in  which  the  Turing  machine  is  a  good  general  model 
of  computation. 

The  existence  of  U  enables  us  to  prove  the  following  theorem: 

THEOREM  17.8  One  Turing  Machine  Can  Simulate  Another 

Theorem:  Given  any  Turing  machine  M  and  input  string  w,  there  exists  a  TUring 
machine  M'  that  simulates  the  execution  of  M  on  if  and: 

•  halts  iff  M  halts  on  tv,  and 

•  if  it  halts,  returns  whatever  result  M  returns. 
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Proof:  Given  a  particular  Af  and  w,  we  construct  a  specific  Af '  to  operate  as  follows: 
M'(x )  = 

Invoke  the  universal  Turing  machine  U  on  the  string  <Af,  w>. 

Notice  that  Af'  ignores  its  own  input  (which  we’ve  called  x).  It  is  a  constant 
function.  Af '  halts  iff  U  halts  and,  if  it  halts,  will  return  the  result  of  executing  Af 
on  w. 


Theorem  17.8  enables  us  to  write,  in  a  Turing  machine  definition,  the  pseudocode, 
“Run  Af  on  and  then  branch  based  on  whether  or  not  Af  halts  (and,  if  it  halts,  what 
it  returns). 

If  the  universal  Turing  machine  is  a  good  idea,  what  about  universal  other  things? 
Could  we,  for  example,  define  a  universal  FSM?  Such  an  FSM  would  accept  the  lan¬ 
guage  L  =  {<F,  w>  :  F  is  a  finite  state  machine  and  loeL(F).}  The  answer  is  no. 
Since  any  FSM  has  only  a  finite  amount  of  memory,  it  has  no  way  to  remember  and 
then  execute  a  program  of  arbitrary  length.  We  have  waited  until  now  to  introduce  the 
idea  of  a  universal  machine  because  we  had  to. 


Exercises 

1.  Give  a  short  English  description  of  what  each  of  these  Hiring  machines  does: 


—  b}.  Af  — 

f  i 

>RqL  b  R  a  LO 


~  {a*  b}.  Af  = 
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2.  Construct  a  standard,  deterministic,  one-tape  Turing  machine  M  to  decide  each  of 
the  following  languages  L.  You  may  find  it  useful  to  define  subroutines.  Describe 
M  in  the  macro  language  defined  in  Section  17.1.5. 

a.  {x  *  y  =  z :  x,  y.  Z  e  and.  when  x.  y.  and  z  are  viewed  as  unary  numbers, 

.try  =  z}.  For  example,  the  siring  1111  *  11  =  11111111 6  L. 

b.  {a'  b'  c' d'. i,  j  ^  ()}. 

c.  {tee  {a. b. c.d}*:  #b(n’)  s  #c0<!)  —  M»f)  ^  !»}• 

3.  Construct  a  standard,  deterministic,  one-tape  Turing  machine  M  to  compute  each 
of  the  following  functions: 

a.  The  function  suhy.  which  is  defined  as  follows: 

sitb$(n)  —  n  —  3  if  n  >  2 
0  if  n  <  2. 

Specifically. compute  mb)  of  a  natural  number  represented  in  binary.  Forex- 
ample.  on  input  10111,  M  should  output  10100.  On  input  11101,  M  should 
output  11010.  (Hinr.  You  may  want  to  define  a  subroutine.) 

b.  Addition  of  two  binary  natural  numbers  (as  described  in  Example  17.13). 
Specifically. given  the  input  string  <_r> ; <v>.  where  <.r>  is  the  binary  en¬ 
coding  of  a  natural  number  x  and  <y>  is  the  binary  encoding  of  a  natural 
number  y,  M  should  output  <z>.  where  z  is  the  binary  encoding  of  x  +  y. 
For  example,  on  input  101;  11.  M  should  output  1000. 

c.  Multiplication  of  two  unary  numbers.  Specifically,  given  the  input  string 
<x>;<y>,  where  <x>  is  the  unary  encoding  of  a  natural  number  x  and 
<y>  is  the  unary  encoding  of  a  natural  number  y.  M  should  output  <z>, 
where  z  is  the  unary  encoding  of  xy.  For  example,  on  input  111;  1111,  M 
should  output  111111111111. 

d.  The  proper  subtraction  function  man  us.  which  is  defined  as  follows: 

monus(n ,  m)  =  n  -  m  if  n  >  m 
0  if  n  s  /». 

Specifically,  compute  monus  of  two  natural  numbers  represented  in  binary. 
For  example,  on  input  101;  11.  M  should  output  10.  On  input  11;  101,  M 
should  output  0. 

4.  Construct  a  Turing  machine  M  that  computes  the  function  /:  {a.b}*  — ►  N,  where: 

f(x)  =  the  unary  encoding  of  mox(#a(x).  #b(-i)). 

For  example,  on  input  aaaabb,  M  should  output  1111.  A7  may  use  more  than 
one  tape.  It  is  not  necessary  to  write  the  exact  transition  function  for  M.  Describe 
it  in  clear  English. 

5.  Construct  a  Turing  machine  M  that  converts  binary  numbers  to  their  unary  repre¬ 
sentations.  So,  specifically,  on  input  <w>.  where  if  is  the  binary*  encoding  of  a 
natural  number  n.  M  will  output  1".  (Hint:  Use  more  than  one  tape.) 

6.  Let  M  be  a  three-tape  Turing  machine  with  2  =  { a.  b.  c }  and  V  =  { a.  b.  c.  □.  1, 2} . 
We  want  to  build  an  equivalent  one-tape  Turing  machine  W  using  the  technique 
described  in  Section  17.3.1.  How  many  symbols  must  there  be  in  I"? 
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7.  In  Example  13.2,  we  showed  that  the  language  L  =  {a"1,  n  s  0}  is  not  context- 
free.  Show  that  it  is  in  D  by  describing,  in  clear  English,  a  Turing  machine  that 
decides  it.  (Hint:  Use  more  than  one  tape.) 

8.  In  Example  17.9,  we  showed  a  Turing  machine  that  decides  the  language  WcW.  If 
we  remove  the  middle  marker  c,  we  get  the  language  WW.  Construct  a  Turing  ma¬ 
chine  M  that  decides  WW.  You  may  want  to  exploit  nondeterminism.  It  is  not  nec¬ 
essary  to  write  the  exact  transition  function  for  M .  Describe  it  in  clear  English. 

9.  In  Example  4.9,  we  described  the  Boolean  satisfiability  problem  and  we  sketched 
a  nondeterministic  program  that  solves  it  using  the  function  choose.  Now  define 
the  language  SAT  =  {<w>:  ic  is  a  wff  in  Boolean  logic  and  w  is  satisfiable}. 
Describe  in  clear  English  the  operation  of  a  nondeterministic  (and  possibly 
n- tape)  Turing  machine  that  decides  SAT. 

10.  Prove  Theorem  17.3. 

11.  Prove  rigorously  that  the  set  of  regular  languages  is  a  proper  subset  of  D. 

12.  In  this  question,  we  explore  the  equivalence  between  function  computation  and 
language  recognition  as  performed  by  Hiring  machines.  For  simplicity,  we  will 
consider  only  functions  from  the  nonnegative  integers  to  the  nonnegative  inte¬ 
gers  (both  encoded  in  binary).  But  the  ideas  of  these  questions  apply  to  any  com¬ 
putable  function.  We'll  start  with  the  following  definition: 

•  Define  the  graph  of  a  function  /to  be  the  set  of  all  strings  of  the  form  \x,  /(x)]t 
where  x  is  the  binary  encoding  of  a  nonnegative  integer,  andf(x)  is  the  binary 
encoding  of  the  result  of  applying/ to  jc. 

For  example,  the  graph  of  the  function  succ  is  the  set  { [0, 1],  [1. 10],  [10, 11], . . . } . 

a.  Describe  in  clear  English  an  algorithm  that,  given  a  Turing  machine  M  that 
computes  /,  constructs  a  Turing  machine  M'  that  decides  the  language  L  that 
contains  exactly  the  graph  of/. 

b.  Describe  in  clear  English  an  algorithm  that,  given  a  Turing  machine  M  that 
decides  the  language  L  that  contains  the  graph  of  some  function /,  constructs 
a  Turing  machine  M'  that  computes  /. 

c.  A  function  is  said  to  be  partial  if  it  may  be  undefined  for  some  arguments.  If  we 
extend  the  ideas  of  this  exercise  to  partial  functions,  then  we  do  not  require  that 
the  Hiring  machine  that  computes/ halt  if  it  is  given  some  input  x  for  which  f(x) 
is  undefined.  Then  L  (the  graph  language  for  /).  will  contain  entries  of  the  form 
M*)]  for  only  those  values  of  *  for  which  /is  defined.  In  that  case,  it  may  not  be 
possible  to  decide  L,  but  it  will  be  possible  to  semidecide  it.  Do  your  construc¬ 
tions  for  parts  (a)  and  (b)  work  if  the  function  /is  partial?  If  not,  explain  how  you 
could  modify  them  so  they  will  work  correctly.  By  “work”,  we  mean: 

•  For  part  (a):  Given  a  Turing  machine  that  computes  f(x)  for  all  values  on 
which /is  defined,  build  a  Turing  machine  that  semidecides  the  language  L 
that  contains  exactly  the  graph  of/, 

•  For  part  (b):  Given  a  Turing  machine  that  semidecides  the  graph  language 
of /(and  thus  accepts  all  strings  of  the  form  [.*,/(*)]  when/x)  is  defined), 

build  a  Turing  machine  that  computes/. 
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13.  What  is  the  minimum  number  of  tapes  required  to  implement  a  universal  Turing 
machine? 

14.  Encode  the  followingTuring  Machine  as  an  input  to  the  universal  Turing  machine 
that  is  described  in  Section  17.7: 

M  =  ( K.  X.  T.  8.  q(h  {/»}).  where: 

K  =  {<?lh  </!.*»}• 

2  =  { a.  b } , 

f  =  ja.b.c,  J). and 

5  is  given  by  the  following  table: 
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The  Church-Turing  Thesis 


The  Turing  machine  is  the  most  powerful  of  the  models  of  computation  that  we 
have  so  far  considered.  There  are  problems  that  can  be  solved  by  a  Turing 
machine  that  cannot  be  solved  by  a  PDA,  just  as  there  are  problems  that  could 
be  solved  by  a  PDA  but  not  by  an  FSM.  Is  this  the  end  of  the  line,  or  should  we  expect 
a  sequence  of  even  more  powerful  models? 

One  way  of  looking  at  things  suggests  that  we  should  expect  to  keep  going.  A  simple 
counting  argument  shows  that  there  are  more  languages  than  there  are  Turing  machines; 

•  There  is  at  most  a  countably  infinite  number  of  Turing  machines  since  we  can  lexi¬ 
cographically  enumerate  all  the  strings  that  correspond  to  syntactically  legal  Turing 
machines. 

•  There  is  an  uncountably  infinite  number  of  languages  over  any  nonempty  alphabet. 

•  Thus  there  are  more  languages  than  there  are  Turing  machines. 

So  there  are  languages  that  cannot  be  recognized  by  any  Turing  machine.  But  can 
we  do  better  by  creating  some  new  formalism?  If  any  such  new  formalism  shares  with 
Turing  machines  the  properly  that  each  instance  of  it  has  a  finite  description  (for  exam¬ 
ple  a  finite  length  Java  program  or  a  finite  length  grammar)  then  the  same  argument 
will  apply  to  it  and  there  will  still  be  languages  that  it  cannot  describe. 

But  there  might  be  some  alternative  model  in  which  we  could  write  finite  length  pro¬ 
grams  and  for  which  no  equivalent  Turing  machine  exists.  Is  there?  We  showed  in  the  last 
chapter  that  there  are  several  features  (e.g.,  multiple  tapes,  nondeterminism)  that  we 
could  add  to  our  definition  of  a  Turing  machine  without  increasing  its  power.  But  does 
that  mean  that  there  is  nothing  we  could  add  that  would  make  a  difference?  Or  might 
there  be  some  completely  different  model  that  has  more  power  than  the  Turing  machine? 


18.1  The  Thesis 

Another  way  to  ask  the  question  about  the  existence  of  a  more  powerful  model  is  this: 
Recall  that  wc  have  defined  an  algorithm  to  be  a  detailed  procedure  that  accomplishes 
some  clearly  specified  task.  Note  that  this  definition  is  general  enough  to  include  deci¬ 
sion  procedures  (functions  that  return  Boolean  values),  as  well  as  functions  that  return 
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values  of  other  types.  In  fact,  it  is  general  enough  to  include  recipes  for  beef  Welling¬ 
ton.  Wc  will,  however,  focus  just  on  tasks  that  involve  computation.  Now  we  can  restate 
our  question:  “Is  there  any  computational  algorithm  that  cannot  be  implemented  by 
some  Turing  machine?  Then,  if  there  is.  can  we  find  some  more  powerful  model  in 
which  we  could  implement  that  algorithm?"  Note  that  we  are  assuming  here  that  both 
real-world  inputs  and  real-world  outputs  can  lie  appropriately  encoded  into  symbols 
that  can  be  written  onto  a  device  such  as  the  Turing  machine’s  tape.  We  are  not  talking 
about  whether  an  abstract  Turing  machine  can  actually  chop  mushrooms.  lake  pictures, 
produce  sound  waves,  or  turn  a  steering  wheel. 

During  the  first  third  of  the  2()lh  century,  a  group  of  influential  mathematicians  was 
focused  on  developing  a  completely  formal  basis  for  mathematics.  Out  of  this  effort 
emerged,  among  other  things.  Principia  Maihematiat  | Whitehead  and  Russell  1910, 
m2, 1913).  w'hich  is  often  described  as  the  most  influential  work  on  logic  ever  written. 
Among  its  achievements  was  the  introduction  of  a  theory  of  types  that  offers  a  way  out 
of  Russell’s  paradox.*  The  continuation  and  the  ultimate  success  of  this  line  of  work 
depended  on  positive  answers  to  two  key  questions: 

1.  Is  it  possible  to  axiomatiz.e  all  of  the  mathematical  structures  of  interest  in  such  a 
way  that  every  true  statement  becomes  a  theorem?  We  will  allow  the  set  of  axioms 
to  be  infinite,  hut  it  must  be  decidable.  In  other  words,  there  must  exist  ar.  algo¬ 
rithm  that  can  examine  a  string  and  determine  whether  or  not  it  is  an  axiom. 

2.  Does  there  exist  an  algorithm  to  decide,  given  a  set  of  axioms,  whether  a  given 
statement  is  a  theorem?  In  other  words,  does  there  exist  an  algorithm  that  always 
halts  and  that  returns  True  when  given  a  theorem  and  Kilsc  otherwise? 


Principia  Mathematica  played  a  landmark  role  in  the  development  of  math¬ 
ematical  logic  in  the  early  part  of  the  20,h  century.  Forty-five  years  later  it 
played  another  landmark  role,  this  time  in  a  discipline  that  Whitehead  and 
Russell  could  never  have  imagined.  In  195ft.  the  Logic  Theorist,  often  re¬ 
garded  as  the  first  artificial  intelligence  program,  prosed  most  of  the  theo¬ 
rems  in  Chapter  2  of  Principia  Mathematica.  (M.2.2) 


It  was  widely  believed  that  the  answer  to  both  of  these  questions  was  yes.  Had  it 
been,  perhaps  the  goal  of  formalizing  ail  of  mathematics  could  have  been  attained.  But 
the  answer  to  both  questions  is  no.  Three  papers  that  appeared  within  a  few  years  of 
each  other  shattered  that  dream. 

Kurt  Godel  showed,  in  the  proof  of  his  Incompleteness  Theorem  (Cibdel  1931].  that 
the  answer  to  question  1  is  no.  In  particular,  he  showed  that  there  exists  no  decidable 
axiomatization  of  Peano  arithmetic  (the  natural  numbers  plus  the  operations  plus  and 
limes)  that  is  both  consistent  and  complete.  By  complete  we  mean  that  all  true  state¬ 
ments  in  the  language  oflhe  theory  arc  theorems.  Note  that  an  infinite  set  of  axioms  is 


’‘Let  M  be  "the  set  of  all  sels  that  are  not  members  of  themselves."  Is  W  ;i  member  of  A/’  the  tact  that  either 
answer  to  this  question  leads  |o a  contradiction  was  noticed  by  Hcrliand  Russell  in  about  1901. The  question 
is  called  "Russell's  paradox." 
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allowed,  bul  it  must  be  decidable.  So  an  infinite  number  of  true  statements  can  be 
made  theorems  simply  by  adding  new  axioms.  But  Godel  showed  that,  no  matter  how 
often  that  is  done,  there  must  remain  other  true  statements  that  are  unprovable. 

Question  2  had  been  clearly  articulated  a  few  years  earlier  in  a  paper  by  David 
Hilbert  and  Wilhelm  Ackermann  [Hilbert  and  Ackermann  1928].  They  called  it  the 
Entscheidungsproblem.  (Entscheidungsproblem  is  German  for  “decision  problem.") 
There  are  three  equivalent  ways  to  state  the  problem: 

•  "Does  there  exist  an  algorithm  to  decide,  given  an  arbitrary  sentence  w  in  first 
order  logic,  whether  w  is  valid  (i.e..true  in  all  interpretations)?” 

•  "Given  a  set  of  axioms  A  and  a  sentence  w,  does  there  exist  an  algorithm  to  decide 
whether  w  is  entailed  by  AT'  Note  that  this  formulation  is  equivalent  to  the  first 
one  since  the  sentence  A  — *•  tc  is  valid  iff  w  is  entailed  by  A. 

•  “Given  a  set  of  axioms  A  and  a  sentence  w ,  does  there  exist  an  algorithm  to  decide 
whether  w  can  be  proved  from  AT'  Note  that  this  formulation  is  equivalent  to  the 
second  one  since  GttdeFs  Completeness  Theorem  tells  us  that  there  exists,  for  first- 
order  logic,  an  inference  procedure  that  is  powerful  enough  to  derive,  from  A,  every 
sentence  that  is  entailed  by  A. 

Note  that  questions  1  and  2  (i.e.,“Can  the  facts  be  axiomalized?”  and  “Can  theo- 
remhood  be  decided?"),  while  related,  are  different  in  an  important  way.  The  fact  that 
the  answer  to  question  1  is  no  does  not  obviously  imply  that  the  answer  to  the 
Entscheidungsproblem  is  no.  While  some  true  statements  are  not  theorems,  it  might 
still  have  turned  out  to  be  possible  to  define  an  algorithm  that  distinguishes  theorems 
from  nontheorems. 

The  Entscheidungsproblem  had  captured  the  attention  of  several  logicians  of  the 
time,  including  Alan  Turing  and  Alonzo  Church.  Turing  and  Church,  working  inde¬ 
pendently,  realized  that,  in  order  to  solve  the  Entscheidungsproblem,  it  was  necessary 
first  to  formalize  what  was  meant  by  an  algorithm.  Turing’s  formalization  was  what  we 
now  call  a  Turing  machine.  Church's  formalization  was  the  lambda  calculus,  which  we 
will  discuss  briefly  below.  The  two  formalizations  look  very  different.  But  Turing 
showed  that  they  are  equivalent  in  power.  Any  problem  that  can  be  solved  in  one  can 
be  solved  in  the  other.  As  it  turns  out  ([Turing  1936]  and  [Church  1936]),  the  Entschei¬ 
dungsproblem  can  be  solved  in  neither.  Well  see  why  this  is  so  in  Chapter  19. 

Bul  out  of  the  negative  results  that  formed  the  core  of  the  Church  and  Turing  papers 
emerged  an  important  new  idea: Turing  machines  and  the  lambda  calculus  are  equiva¬ 
lent.  Perhaps  that  observation  can  be  extended. 

The  Church-Turing  thesis ,  or  sometimes  just  Church's  thesis,  states  that  all  for¬ 
malisms  powerful  enough  to  describe  everything  we  think  of  as  a  computational  algo¬ 
rithm  are  equivalent. 

We  should  point  out  that  this  statement  is  stronger  than  anything  that  either  Church 
or  Turing  actually  said.  This  version  is  based  on  a  substantial  body  of  work  that  has  oc¬ 
curred  since  Turing  and  Church's  seminal  papers.  Also  note  that  we  have  carefully  used 
the  word  thesis  here,  rather  than  theorem.  There  exists  no  proof  of  the  Church-Turing 
thesis  because  its  statement  depends  on  our  informal  definition  of  a  computational  algo¬ 
rithm.  It  is  in  principle  possible  that  someone  may  come  up  with  a  more  powerful  model. 
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Many  very  different  models  have  been  proposed  over  the  years.  We  will  examine  a  few  of 
them  below.  All  have  been  shown  to  be  no  more  powerful  than  the  Turing  machine. 

The  Church-Turing  thesis  is  significant.  In  the  next  several  chapters,  w'e  are  going  to 
prove  that  there  are  important  problems  whose  solutions  cannot  be  computed  by  any 
Turing  machine.  The  Church-Turing  thesis  tells  us  that  we  should  not  expect  to  find 
some  other  reasonable  computational  model  in  which  those  same  problems  can  be 
solved.  Moreover,  the  equivalence  proofs  that  support  the  thesis  tell  us  that  it  is  certain 
that  those  problems  cannot  be  solved  in  any  of  the  specific  computational  models  that 
have  so  far  been  considered  and  compared  to  the  Turing  machine. 

18.2  Examples  of  Equivalent  Formalisms  ♦ 

All  of  the  following  models  have  been  shown  to  be  equivalent  to  our  basic  definition  of 
a  Turing  machine: 

•  Modern  computers,  if  we  assume  that  there  is  an  unbounded  amount  of  memory 
available. 

•  Lambda  calculus. 

•  Partial  recursive  functions  (in  which  the  class  of  computable  functions  is  built  from  a 
small  number  of  primitive  functions  and  a  small  set  of  combining  operations). 

•  Tag  systems  (in  which  we  augment  an  FSM  with  a  FIFO  queue  rather  than  a  stack). 

•  Unrestricted  grammars  (in  which  we  remove  the  constraint  that  the  left-hand  side 
of  each  production  must  consist  of  just  a  single  nonterminal). 

•  Post  production  systems  (in  which  we  allow  grammar-like  rules  with  variables). 

•  Markov  algorithms. 

•  Conway’s  Game  of  Life. 

•  One  dimensional  cellular  automata. 

•  Various  theoretical  models  of  DNA-based  computing. 

•  Lindenmayer  systems. 

We  will  describe  recursive  functions  in  Chapter  25.  unrestricted  grammars  in 
Chapter  23,  and  Lindenmayer  systems  (also  called  L-sy stems)  in  Section  24.4.  In  the 
remainder  of  this  chapter  and  we  will  briefly  discuss  the  others. 

18.2.1  Modern  Computers 

We  showed  in  Section  17.4  that  the  functionality  of  modern  “real”  computers  can  be 
implemented  with  Turing  machines.  This  observation  suggests  a  slightly  different  way 
to  define  the  decidable  languages  (i.e.,  those  that  are  in  D).  A  language  L  is  decidable 
if  there  exists  a  decision  procedure  for  it. 

18.2.2  Lambda  Calculus 

Alonzo  Church  developed  the  lambda  calculus  a  as  a  way  to  formalize  the  notion  of 
an  algorithm.  While  Turing’s  solution  to  that  same  problem  has  the  feel  of  a  procedure. 
Church's  solution  feels  more  like  a  mathematical  specification. 
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The  lambda  calculus  is  ihe  basis  for  modern  functional  programming  lan¬ 
guages  like  Lisp,  Scheme,  ML,  and  Haskell.  (G.5) 


The  lambda  calculus  is  an  expression  language.  Each  expression  defines  a  function  of 
a  single  argument,  which  is  written  as  a  variable  bound  by  the  operator  A.  For  example, 
the  following  simple  lambda  calculus  expression  describes  the  successor  function: 

(Ax.x  +  1). 

Functions  can  be  applied  to  arguments  by  binding  each  argument  to  a  formal 
parameter.  So: 


(Ax.x  -I-  1)3. 

is  evaluated  by  binding  3  to  x  and  computing  the  result,  4. 

Functions  may  be  arguments  to  other  functions  and  the  value  that  is  computed  by  a 
function  may  be  another  function.  One  of  the  most  common  uses  of  this  feature  is  to 
define  functions  that  we  may  think  of  as  taking  more  than  one  argument.  For  example, 
we  can  define  a  function  to  add  two  numbers  by  writing: 

(A  x.  Ay.  x  +  y) 

Function  application  is  left  associative.  So  we  can  apply  the  addition  function  that 
we  just  described  by  writing,  for  example: 

(A  x.  Ay.  x  -F  y)  3  4 

This  expression  is  evaluated  by  binding  3  to  x  to  create  the  new  function  (A  y.  3  +  y), 
which  is  then  applied  to  4  to  return  7. 

In  the  pure  lambda  calculus,  there  is  no  built-in  data  type  number.  All  expressions 
are  functions.  But  the  natural  numbers  can  be  defined  as  lambda  calculus  functions.  So 
the  lambda  calculus  can  effectively  describe  numeric  functions  just  as  we  have  done. 

The  lambda  calculus  can  be  shown  to  be  equivalent  in  power  to  the  Turing  machine. 
In  other  words,  the  set  of  functions  that  can  be  defined  in  the  lambda  calculus  is  equal 
to  the  set  of  functions  that  can  be  computed  by  a  Tbring  machine.  Because  of  this 
equivalence,  any  problem  that  is  undecidable  for  Turing  machines  is  also  undecidable 
for  the  lambda  calculus.  For  example,  we'll  see  in  Chapter  21  that  it  is  undecidable 
whether  two  Tbring  machines  are  equivalent.  It  is  also  undecidable  whether  two  ex¬ 
pressions  in  the  lambda  calculus  are  equivalent.  In  fact.  Church’s  proof  of  that  result 
was  the  first  formal  undecidability  proof.  (It  appeared  months  before  Turing’s  proof  of 
the  undecidabilily  of  questions  involving  Turing  machines.) 


18.2.3  Tag  Systems 

In  the  1920s,  a  decade  or  so  before  the  pioneering  work  of  Godel,  Turing,  and  Church 
was  published,  the  Polish  logician  Emil  Post  began  working  on  the  decidability  of  logi¬ 
cal  theories.  Out  of  his  work  emerged  two  formalisms  that  are  now  known  to  be  equiv¬ 
alent  to  the  Turing  machine.  We’ll  mention  the  first,  tag  systems,  here  and  the  second, 
Post  production  systems,  in  the  next  section. 
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Post,  and  others,  defined  various  versions  (with  differing  restrictions  on  the  alphabet 
and  on  the  form  of  the  operations  that  are  allowed)  of  the  basic  tag  system  architec¬ 
ture.  We  describe  the  simplest  here:  A  tag  system ,  sometimes  now  called  a  Post 
machine,  is  a  finite  state  machine  that  is  augmented  with  a  first-in,  first  out  (FIFO) 
queue.  In  other  words,  it’s  a  PDA  with  a  FIFO  queue  rather  than  a  slack. 

It  is  easy  to  see  that  there  are  languages  that  are  not  context-free  (and  so  cannot  be 
accepted  by  any  PDA)  but  that  can  be  accepted  by  a  tag  system.  Recall  that  while 
PalEven  =  {wwR:  we  {a.  b}*}  is  context-free,  ilscousin.  WW  =  {«’«’:  ice  {a.b}*}, 
in  which  the  second  half  of  the  string  is  not  reversed,  is  not  context-free.  We  could  not 
build  a  PDA  for  WW  because,  using  a  slack,  there  was  no  way  to  compare  the  characters 
in  the  second  half  of  a  string  to  the  characters  in  the  first  hair  except  by  reversing  them. 

If  we  can  use  a  FIFO  queue  instead  of  a  slack,  we  no  longer  have  this  problem.  So  a  sim¬ 
ple  tag  system  to  accept  WW  writes  the  first  half  of  its  input  siring  into  its  queue  and 
then  removes  characters  from  the  head  of  the  queue,  one  at  a  time,  and  checks  each  of 
them  against  the  characters  in  the  second  half  of  the  input  string. 

But  have  we  simply  traded  one  set  of  languages  for  another?  Or  can  we  build  a  tag 
system  to  accept  PalEven  as  well  as  WW?  The  answer  is  that,  while  there  is  not  a  sim¬ 
ple  tag  system  to  accept  PalEven,  there  is  a  tag  system.  In  fact,  any  language  that  can  be 
accepted  by  a  Turing  machine  can  also  be  accepted  by  a  tag  system.  To  see  why,  we’ll 
sketch  a  technique  for  simulating  a  Turing  machine  with  a  tag  system.  Let  the  tag  sys¬ 
tem’s  queue  correspond  to  the  Turing  machine’s  active  tape  plus  a  blank  on  either  side 
and  let  the  head  of  the  tag  system’s  queue  contain  the  square  that  is  under  the  Turing 
machine’s  read/write  head. 

Now  we  just  need  a  way  to  move  both  left  and  right  in  the  queue,  which  would  be 
easy  if  the  tag  system’s  queue  were  a  loop  (i.e.,  if  its  front  and  back  were  glued  together). 
It  isn't  a  loop,  but  we  can  treat  it  as  though  it  were.  To  simulate  a  Turing  machine  that 
moves  its  head  one  square  to  the  right,  remove  the  symbol  at  the  head  of  the  queue  and 
add  it  to  the  tail.  To  simulate  a  Turing  machine  that  moves  its  head  one  square  to  the 
left,  consider  a  queue  that  contains  n  symbols.  One  at  a  time,  remove  the  first  n  —  1 
symbols  from  the  head  of  the  queue  and  add  them  to  the  tail. To  simulate  a  Turing  ma¬ 
chine  that  moves  onto  the  blank  region  of  its  tape,  exploit  the  fact  that  a  tag  system  is 
allowed  to  push  more  than  one  symbol  onto  the  end  of  its  queue.  So  push  two.  one  of 
which  corresponds  to  the  newly  nonblank  square. 


18.2.4  Post  Production  Systems 

We  next  consider  a  second  formalism  that  is  derived  from  Post’s  early  work.  This  one  is 
based  on  the  idea  of  a  rewrite  or  production  or  rule-based  system.  A  Post  production 
system  (or  simply  Post  system ).  as  such  systems  have  come  to  be  known  (although  Post 
never  called  them  that),  shares  with  the  grammar  formulisms  that  we  have  considered 
the  property  that  compulation  is  accomplished  by  applying  a  set  of  production  rules 
whose  left-hand  sides  are  matched  against  a  current  working  string  and  whose  right- 
hand  sides  are  used  to  rewrite  the  working  string. 
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Post’s  early  work  inspired  the  development  of  many  modem  rule-based  sys¬ 
tems,  including  context-free  grammars  described  in  BNF  (G.  1.1),  rule-based 
expert  systems  (M.3),  production  rule-based  cognitive  architectures  (M.3.2), 
and  rule-based  specifications  for  the  behavior  of  NPCs  in  interactive  games. 
(N.3.3) 


Based  on  the  ideas  described  in  Post’s  work,  we  define  a  Post  system  P  to  be  a  quin¬ 
tuple  (V,  2,  X,  /?,  S),  where: 

•  V  is  the  rule  alphabet,  which  contains  nonterminal  and  terminal  symbols, 

•  2  (the  set  of  terminals)  is  a  subset  of  V, 

•  A"  is  a  set  of  variables  whose  values  are  drawn  from  V  *, 

•  R  (the  set  of  rules)  is  a  finite  subset  of  (K  U^)*  X  (V  U  X  )*,  with  the  additional 
constraint  that  every  variable  that  occurs  on  the  right-hand  side  of  a  rule  must  also 
have  occurred  on  the  left-hand  side,  and 

•  S  (the  start  symbol)  can  be  any  element  of  V  -  2. 

There  are  three  important  differences  between  Post  systems,  as  just  defined,  and  both 

the  regular  and  context-free  grammar  formalisms  that  we  have  already  considered: 

1.  In  a  Post  system,  the  left-hand  side  of  a  rule  may  contain  two  or  more  symbols. 

2.  In  a  Post  system,  rules  may  contain  variables.  When  a  variable  occurs  on  the  left- 
hand  side  of  a  rule,  it  may  match  any  element  of  V  *.  When  a  variable  occurs  on 
the  right-hand  side,  it  will  generate  whatever  value  it  matched. 

3.  In  a  Post  system,  a  rule  may  be  applied  only  if  its  left-hand  side  matches  the 
entire  working  string.  When  a  rule  is  applied,  the  entire  working  string  is 
replaced  by  the  string  that  is  specified  by  the  rule’s  right-hand  side.  Note  that 
this  contrasts  with  the  definition  of  rule  application  that  we  use  in  our  other 
rule-based  formalisms.  In  them,  a  rule  may  match  any  substring  of  the  working 
string  and  just  that  substring  is  replaced  as  directed  by  the  rule’s  right-hand 
side.  So,  suppose  that  we  wanted  to  write  a  rule  A—>B  that  replaced  an  A  any¬ 
where  in  the  string  with  a  B.  We  would  have  to  write  instead  the  rule 
XAY  —*XBY.  The  variables  X  and  Y  can  match  everything  before  the  A  and 
after  it,  respectively. 

As  with  regular  and  context-free  grammars,  let  x  =>P  y  mean  that  the  string  y  can  be 
derived  from  the  string  .v  by  applying  a  single  rule  in  RP.  Let  x  =V  y  mean  that  y  can 
be  derived  from  x  by  applying  zero  or  more  rules  in  RP.  The  language  generated  by  P, 
denoted  L(P)  is  { w  e  2*  :  S  =*P*  w»}. 
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EXAMPLE  18.1  A  Post  System  for  WW 

Recall  the  language  WW  =  {imp :  we.  {a.  b}*}.  which  is  in  D  (i.e.,  it  is  decid¬ 
able)  but  is  not  context-free.  We  can  build  a  Post  system  P  that  generates  WW. 
P  =  ({S,  a.b},  {a.b},  {X),R,S),  where  R  = 


(1) 

XS  —  XaS 

I*  Generate  (a  U  b)*  S. 

(2) 

A'S—ATjS 

m 

(3) 

xs  -*xx 

1*  Create  a  second  copy  of  X. 

This  Post  system  can  generate,  for  example,  the  string  abbabb.  It  docs  so  as  follows: 
S  =>  (using  rule  (1 )  and  letting  X  match  c) 
aS  =>  (using  rule  (2)  and  letting  X  match  a) 
abS  =>  (using  rule  (2)  and  letting  X  match  ab) 
abbS  =*  (using  rule  (3)  and  letting  X  match  abb) 
abbabb 


Post  systems,  as  we  have  just  defined  them,  are  equivalent  in  power  to  Turing  ma¬ 
chines.  The  set  of  languages  that  can  be  generated  by  a  Post  system  is  exactly  SD,  the 
set  of  semidecidnble  languages.  The  proof  of  this  claim  is  by  construction.  For  any  Post 
system  P.  it  is  possible  to  build  a  Turing  machine  M  that  simulates  P.  And,  for  any  Tiff¬ 
ing  machine  M,  it  is  possible  to  build  a  Post  system  P  that  simulates  M. 


18.2.5  Unrestricted  Grammars 

While  the  availability  of  variables  in  Post  systems  is  convenient,  variables  are  not  actu¬ 
ally  required  to  give  Post  systems  their  power.  In  Chapter  23.  we  will  describe  another 
formalism  that  we  will  call  an  unrestricted  grammar. The  rules  in  an  unrestricted  gram¬ 
mar  may  not  contain  variables,  but  their  left-hand  sides  may  contain  any  number  of 
terminal  and  nonterminal  symbols,  subject  to  the  sole  constraint  that  there  be  at  least 
one  symbol.  Unrestricted  grammars  have  exactly  the  same  power  as  do  Post  systems 
and  Turing  machines.  They  can  generate  exactly  the  semideeidable  (SD)  languages.  In 
Example  23.3  we’ll  show  an  unrestricted  grammar  that  generates  WW  (the  language 
we  considered  above  in  Example  18.1 ). 


18.2.6  Markov  Algorithms 

Next  we  consider  yet  another  formalism  based  on  rewrite  rules.  A  Markov  algorithm  9 
(named  for  its  inventor,  Andrey  A.  Markov.  Jr.,  the  son  of  the  inventor  of  the  stochas¬ 
tic  Markov  model  that  we  described  in  Section  5.1 1.1).  is  simply  an  ordered  list  of 
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rules,  each  of  which  has  a  left-hand  side  that  is  a  single  string  and  a  right-hand  side 
that  is  also  a  single  string.  Formally  a  Markov  algorithm  M  is  a  triple  (F,  2,fl), 
where: 

•  V  is  the  rule  alphabet,  which  contains  both  working  symbols  and  input  symbols. 
Whenever  the  job  of  M  is  to  semidecide  or  decide  a  language  (as  opposed  to  com¬ 
pute  a  function),  V  will  contain  two  special  working  symbols,  Accept  and  Reject. 

•  2  (the  set  of  input  symbols)  is  a  subset  of  V,  and 

•  R  (the  rules)  is  an  ordered  list  of  rules,  each  of  which  is  an  element  of  V  *  x  V*. 
There  are  two  kinds  of  rules,  continuing  and  terminating.  Whenever  a  terminating 
rule  is  applied,  the  algorithm  halts.  We  will  write  continuing  rules,  as  usual,  as 
X  — *  Y.  We  will  write  terminating  rules  by  adding  a  dot  after  the  arrow.  So  we  will 
have  X—*mY. 

Notice  that  there  is  no  start  symbol.  Markov  algorithms,  like  Taring  machines,  are 
given  an  input  string. The  job  of  the  algorithm  is  to  examine  its  input  and  return  the  ap¬ 
propriate  result. 

The  rules  are  interpreted  by  the  following  algorithm: 

Murkovalgoriihm(M\  Markov  algorithm,  w  :  input  string)  = 

1.  Until  no  rules  apply  or  the  process  has  been  terminated  by  executing  a  terminal 
rule  do: 

1.1.  Find  the  first  rule  in  the  list  R  that  matches  against  w .  If  that  rule 
matches  w  in  more  than  one  place,  choose  the  leftmost  match. 

1.2.  If  no  rule  matches  then  exit. 

13.  Apply  the  matched  rule  to  w  by  replacing  the  substring  that  matched 
the  rule’s  left-hand  side  with  the  rule’s  right-hand  side. 

1.4.  If  the  matched  rule  is  a  terminating  rule,  exit. 

2.  If  iv  contains  the  symbol  Accept  then  accept. 

3.  If  io  contains  the  symbol  Reject  then  reject. 

4.  Otherwise,  return  w. 

Notice  that  a  Markov  algorithm  (unlike  a  program  in  any  of  the  other  rule-based 
formalisms  that  we  have  considered  so  far)  is  completely  deterministic.  At  any  step, 
either  no  match  exists,  in  which  case  the  algorithm  halts,  or  exactly  one  match  can  be 
selected. 


The  logic  programming  language  Prolog  executes  programs  (sets  of  rules)  in 
very  much  the  same  way  that  the  Markov  algorithm  interpreter  does.  Pro¬ 
grams  are  deterministic  and  programmers  control  the  order  in  which  rules 
are  applied  by  choosing  the  order  in  which  to  write  them.  (M.2.3) 
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The  Markov  algorithm  formalism  is  equivalent  in  power  to  the  Turing  machine. This 
means  that  Markov  algorithms  can  semidecide  exactly  the  set  of  SD  languages  (in 
which  case  they  may  accept  or  reject)  and  they  can  compute  exactly  the  set  of  com¬ 
putable  functions  (in  which  case  they  may  return  a  value  ).The  proof  of  this  claim  is  by 
construction:  It  is  possible  to  show  that  a  Markov  algorithm  can  simulate  the  universal 
Turing  machine  U.  and  vice  versa. 


EXAMPLE  18.2  A  Markov  Algorithm  for  A"BnC" 

We  show  a  Markov  algorithm  M  to  decide  the  language  AnBnCn  =  {a''bnc": 
n  s  OJ.LetM  =  ({a,b,c,#.%.?,  Accept,  Reject),  {a.  b.  c  where  R  = 


1. 

#a  — *  % 

/*If  the  first  character  is  an  a,  erase  it  and  look 
for  a  b  next. 

2. 

#b  — ►  •  Reject 

1*  If  the  first  character  is  a  b.  reject. 

3. 

#c  — *  •  Reject 

1*  If  the  first  character  is  a  c.  reject. 

4. 

%a  — »  a% 

f*  Move  the  %  past  the  a’s  until  it  finds  a  b. 

5. 

%b  -*  ? 

/*  If  it  finds  a  b.  erase  it  and  look  for  a  c  next. 

6. 

%  —*•  Reject 

1*  No  b  found.  Just  c’s  or  end  of  string.  Reject. 

7. 

?b  —  b? 

/*  Move  the  ?  past  the  b’s  until  it  finds  a  c. 

8. 

?c  — *e 

/*  If  it  finds  a  c,  erase  it.  Then  only  rule  (11) 
can  fire  next. 

9. 

?  — *  •  Reject 

/*  No  c  found.  Just  a’s  or  b’s  or  end  of  string. 
Reject. 

10. 

#  -*  •  Accept 

/*  A  #  was  created  but  there  are  no  input  char¬ 
acters  left.  Accept. 

1L 

E  *  # 

/*  This  one  goes  first  since  none  of  the  others 
can. 

When  M  begins,  the  only  rule  that  can  fire  is  11,  since  all  the  others  must 
match  some  working  symbol.  So  rule  11  matches  at  the  far  left  of  the  input 
string  and  adds  a  #  to  the  left  of  the  siring.  If  the  first  input  character  is  an  a,  it 
will  be  picked  up  by  rule  1,  then  erased  and  replaced  by  a  new  working  symbol . 
The  job  of  the  %  is  to  sweep  past  any  other  a’s  and  find  the  first  b.  If  there  is  no  b 
or  if  a  c  comes  first,  M  will  reject.  If  there  is  a  b,  it  will  be  picked  up  by  rule  5, 
then  erased  and  replaced  by  a  third  working  symbol  ?,  whose  job  is  to  sweep 
past  any  remaining  b’s  and  find  the  first  c.  If  there  is  no  c,  M  will  reject.  If  there 
is,  it  will  be  erased  by  rule  8.  At  that  point,  there  are  no  remaining  working 
symbols,  so  the  only  thing  that  can  happen  is  that  rule  1 1  fires  and  the  process 
repeats  until  all  matched  sets  of  a’s,  b’s.  and  c’s  have  been  erased.  If  that  hap¬ 
pens,  the  final  #  that  rule  11  adds  will  be  the  only  symbol  left.  Rule  10  will  fire 
and  accept. 
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FIGURE  18.1  An  example  of  the 
Game  of  Life. 


1 8.2.7  Conway's  Game  of  Life 

The  Game  of  Life  B  was  first  proposed  by  John  Conway.  In  the  game,  the  board  (the 
world)  starts  out  in  some  initial  configuration  in  which  each  square  is  either  alive 
(shown  in  black)  or  dead  (shown  in  white).  A  simple  example  is  shown  in  Figure  18.1. 

Life  is  not  a  game  in  the  usual  sense  of  having  players.  It  is  more  like  a  movie  that 
we  can  watch.  It  proceeds  in  discrete  steps.  At  each  step,  the  value  for  each  cell  is  deter¬ 
mined  by  computing  the  number  of  immediate  neighbors  (including  the  four  on  the  di¬ 
agonals,  so  up  to  a  maximum  of  eight)  it  currently  has,  according  to  the  following  rules: 

•  A  dead  cell  with  exactly  three  live  neighbors  becomes  a  live  cell  (birth). 

•  A  live  cell  with  two  or  three  live  neighbors  stays  alive  (survival). 

•  In  all  other  cases,  a  cell  dies  or  remains  dead  (overcrowding  or  loneliness). 

Once  values  for  all  the  cells  at  the  next  step  have  been  determined,  all  of  them 
change  values  simultaneously.  Then  the  next  step  begins. 


Life  is  fascinating  to  watch  B. 


Life  can  be  played  on  a  board  of  any  size  and  it  can  be  given  any  desired  starting 
configuration.  Depending  on  the  starting  configuration.  Life  may  end  (all  the  cells  die), 
it  may  reach  some  other  stable  configuration  (it  looks  the  same  from  one  step  to  the 
next),  or  it  may  enter  a  cycle  of  configurations.  We’ll  say  that  the  game  of  Life  halts  iff 
it  reaches  some  stable  configuration. 

We  can  imagine  the  Life  simulator  as  a  computing  device  that  takes  the  initial  board 
configuration  as  input,  knows  one  operation  (namely  how  to  move  from  one  configura¬ 
tion  to  the  next),  may  or  may  not  halt,  and  if  it  halts,  produces  some  stable  configuration 
as  its  result.  Conway  and  others  have  shown  that,  with  an  appropriate  encoding  of  T\ir- 
ing  machines  and  input  strings  as  board  configurations,  the  operation  of  any  Turing  ma¬ 
chine  can  be  simulated  by  the  game  of  Life.  And  a  Life  simulator  can  be  written  as  a 
Hiring  machine.  So  Life  is  equivalent  in  power  to  a  Turing  machine. 

18.2.8  One  Dimensional  Elementary  Cellular  Automata 

The  game  of  Life  can  be  thought  of  as  a  two-dimensional  cellular  automaton.  Each  square 
looks  at  its  neighboring  cells  in  two  dimensions  to  decide  what  should  happen  to  it  at  the 
next  step.  But  we  don’t  need  two  dimensions  to  simulate  a  Turing  machine.  Wolfram  [2002] 
describes  one-dimensional  cellular  automata  S  that  look  like  the  one  shown  in  Figure  18.2. 
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FIGURE  18.2  A  one-dimensional  cellular 
automaton. 
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FIGURE  18.3  Rule  1 10. 

As  in  the  game  of  Life,  each  cell  is  cither  on  or  ofr  (black  or  white),  an  initial  configu¬ 
ration  is  specified,  and  the  configuration  of  the  automaton  at  each  later  step  t  is  deter¬ 
mined  by  independently  computing  the  value  for  each  cell,  which  in  turn  is  a  function 
solely  of  the  values  of  itself  and  its  neighbors  (in  this  case  two)  at  step  t  -  1. 

In  the  game  of  Life.  Conway  specified  the  rule  that  is  to  be  used  to  compute  the  value 
of  each  cell  at  the  next  step.  What  rule  shall  we  use  for  these  one-dimensional  automata? 
Since  each  cell  can  have  one  of  the  two  values  (black  or  white)  and  each  cell’s  next  con¬ 
figuration  depends  on  the  current  configuration  of  three  cells  (itself  and  its  two  neigh¬ 
bors).  there  are  256  (2s)  rules  that  we  could  use.  Each  rule  contains  8  (2*)  parts, 
specifying  what  should  happen  next  for  each  of  the  8  possible  current  situations  Figure  182 
shows  the  rule  that  Wolfram  numbers  110.  Wolfram  describes  a  proof  that  Rule  110. 
with  an  appropriate  (and  complex)  encoding  of  luring  machines  and  strings  as  cellular 
automata,  is  equivalent  in  power  to  the  Turing  machine. 


18.2.9  DNA  Computing 


See  K.l  for  a  very  short  introduction  to  molecular  biology  and  genetics. 


In  1993,  Lcn  Adleman  observed  that  DNA  molecules  and  Turing  machine  tapes  both 
do  the  same  thing:  They  encode  information  as  strings.  Further,  he  observed  that  both 
nature  and  the  Turing  machine  offer  simple  operations  for  manipulating  those  strings. 
So  he  wondered,  can  DNA  compute?  To  begin  to  answer  that  question,  he  performed 
a  fascinating  experiment  o.  In  a  laboratory,  he  solved  an  instance  of  the  Hamiltonian 
path  problem^  using  DNA  molecules.  More  precisely,  what  he  did  was  the  following: 


‘'The  definition  that  Adleman  uses  for  the  Hamiltonian  path  problem  is  the  following:  Ix*i  (» be  a  directed  graph, 
with  one  node  .v  designated  as  the  start  node  and  another  node  d designated  as  the  end  node.  A  Hamiltonian  path 
through  O'  is  a  path  that  begins  at  v,  ends  at  </.  and  visit*,  each  other  node  in  fi  exactly  once.  A  Hamiltonian  path 
problem  is  then  the  following  decision  problem:  Given  a  directed  graph  O',  with  designated  s  and  d,  does  there 
exist  a  Hamiltonian  path  through  it?  We  w  ill  return  to  this  problem  in  I’ari  V.  I  here  w  e  w  ill  use  a  slightly  different 
definition  that  asks  for  any  Hamiltonian  path  through  O'.  It  will  not  specify  a  particular  start  and  end  vertex. 
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1.  He  chose  a  particular  graph  G  (with  7  vertices  and  14  edges). 

2.  He  encoded  each  vertex  of  G  as  a  sequence  of  8  nucleotides.  For  example,  a  ver¬ 
tex  might  be  represented  as  ACCTGCAG. 

3.  He  encoded  each  directed  edge  of  G  as  a  sequence  of  8  nucleotides,  namely  the 
last  four  from  the  encoding  of  the  start  vertex  and  the  first  four  from  the  encod¬ 
ing  of  the  end  vertex.  So,  for  example,  if  there  was  an  edge  from  ACTTGCAG  to 
TCGGACTG,  then  it  would  be  encoded  as  GCAGTCGG. 

4.  He  synthesized  many  copies  of  each  of  the  edge  sequences,  as  well  as  many 
copies  of  the  DNA  complements10  of  all  the  vertex  encodings.  So  for  example, 
since  one  of  the  vertices  was  encoded  as  ACTTGCAG,  its  complement,  the  sequence 
TGAACGTC,  was  synthesized. 

5.  He  combined  the  vertex-complement  molecules  and  the  edge  molecules  in  a  test 
tube,  along  with  water,  salt,  some  important  enzymes,  and  a  few  other  chemicals 
required  to  support  the  natural  biological  processes. 

6.  He  allowed  to  happen  the  natural  process  by  which  complementary  strands  of 
DNA  in  solution  will  meet  and  stick  together  (anneal).  So  for  example,  consider 
again  the  edge  GCAGTCGG.  It  begins  at  the  vertex  whose  encoding  is  ACTTGCAG  and 
it  ends  at  a  vertex  whose  encoding  is  TCGGACTG,  The  complements  of  those  ver¬ 
tices  are  TGAACGTC  and  AGCCTGAC  So,  in  solution,  the  edge  strands  will  anneal 
with  the  vertex-complement  strands  to  produce  the  double  strand: 


path  of  length  one  (i.e.,  one  edge): 
complement  of  sequence  of  two  vertices: 


GCAGTCGG 


It  G  A  A  c  G  T  ClfA~G~~C  C  T  G  A  cl 


But  then,  suppose  that  there  is  an  edge  from  the  second  vertex  to  some  third  one. 
Then  that  edge  will  anneal  to  the  lower  string  that  was  produced  above,  generating: 


path  of  length  two: 
complement  of  sequence  of  two 
vertices: 


G~C  A  G  T  C~G~~Gl|A  C  T  G  G  G  C  T 


TGAACCT  Cl  1 A  G  C  C  T  G  A  C 


Then  a  third  vertex  may  anneal  to  the  right  end  of  the  path  sequence.  And  so 
forth.  Eventually,  if  there  is  a  path  from  the  start  vertex  to  the  end  one,  there  will 
be  a  sequence  of  fragments,  like  our  top  one,  that  corresponds  to  that  path. 

7.  He  allowed  a  second  biological  reaction  to  occur.  The  enzyme  ligase  that  had 
been  added  to  the  mixture  joins  adjacent  sequences  of  DNA.  So  instead  of 
strands  of  fragments,  as  above,  the  following  strands  will  be  produced: 

path  of  length  two:  IGCAGTCGGACTGGGCTI 

complement  of  sequence  Itgaacgtcagcc  t  g~ a~c  ccgatacaI 
of  three  vertices: 


111  Each  DNA  molecule  is  a  double  strand  of  nucleotide  sequences.  Each  nucleotide  contains  one  of  the  four 
bases:  adenine  (A),  thymine  ( T),  guanine  (G)  and  cytosine  (C).  Each  of  these  has  a  complement:  C  and  G  are 
complements  and  A  and  T  arc  complements.  When  a  double  strand  of  DNA  is  examined  as  a  sequence  of 
base  pairs  (one  from  each  strand),  every  base  occurs  across  from  its  complement.  So.  whenever  one  strand 
has  a  C.  the  other  has  a  G.  And  whenever  one  strand  has  an  A,  the  other  has  a  T. 
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8.  He  used  the  polymerase  chain  reaction  (PCR)  technique  to  make  massive  num¬ 
bers  of  copies  of  exactly  those  sequences  that  started  at  the  start  vertex  and 
ended  at  the  end  one.  Other  sequences  were  still  present  in  the  mix  after  this  step, 
but  in  much  lower  numbers. 

9.  He  used  gel  electrophoresis  to  select  only  those  molecules  whose  length  corre¬ 
sponded  to  a  Hamiltonian  path  through  the  graph. 

10.  He  checked  that  each  of  the  vertices  other  than  the  source  and  the  destination 
did  in  fact  occur  in  the  selected  molecules.  To  do  this  required  one  pass  through 
the  following  procedure  for  each  intermediate  vertex: 

10.1.  Use  a  DNA  lhal  attracts  molecules  that  contain  a  particular 

DNA  sequence  (i.c..  the  one  for  the  vertex  that  is  being  checked). 

10.2.  Use  a  magnet  to  attract  the  probes. 

103.  Throw  away  the  rest  of  the  solution,  thus  losing  those  molecules  that  were 
not  attached  to  the  probe. 

11.  He  checked  that  some  DNA  molecules  remained  at  the  end.  Only  molecules  that 
corresponded  to  paths  that  started  at  the  start  vertex,  ended  at  the  end  vertex, 
had  the  correct  length  for  a  path  that  visited  each  vertex  exactly  once,  and  con¬ 
tained  each  of  the  vertices  could  still  be  present.  So  if  any  DNA  was  left,  a  Hamil¬ 
tonian  path  existed. 

Since  that  early  experiment,  other  scientists  have  tried  other  ways  of  encoding  infor¬ 
mation  in  DNA  molecules  and  using  biological  operations  to  compute  with  it  H.The 
question  then  arises:  Is  DNA  computing  Turing-equivalent?  The  answer  depends  on 
exactly  what  we  mean  by  DNA  computing.  In  particular,  what  operations  are  allowed? 
For  example,  must  the  model  be  limited  only  to  operations  that  can  be  performed  only 
by  naturally  occurring  enzymes?  It  has  been  shown  that,  given  some  reasonable  as¬ 
sumptions  about  allowed  operations.  DNA  computing  is  Turing-equivalent. 


Exercises 

1.  Church’s  Thesis  makes  the  claim  that  all  reasonable  formal  models  of  computa¬ 
tion  are  equivalent.  And  we  showed  in.  Section  17.4.  a  construction  that  proved 
that  a  simple  accumulator/register  machine  can  be  implemented  as  a  Turing  ma¬ 
chine.  By  extending  that  construction,  we  can  show  that  any  computer  can  be  im¬ 
plemented  as  a  Turing  machine.  So  the  existence  of  a  decision  procedure  (stated 
in  any  notation  that  makes  the  algorithm  clear)  to  answer  a  question  means  that 
the  question  is  decidable  by  a  Turing  machine. 

Now  suppose  that  we  take  an  arbitrary  question  for  which  a  decision  proce¬ 
dure  exists.  If  the  question  can  be  reformulated  as  a  language,  then  the  language 
will  be  in  D  iff  there  exists  a  decision  procedure  to  answer  the  question.  For  each 
of  the  following  problems,  your  answers  should  be  a  precise  description  of  an  al¬ 
gorithm.  It  need  not  be  the  description  of  a  Turing  Machine: 
a.  Let  L  =  {<  M  >  :  M  is  a  DFSM  that  doesn't  accept  any  string  containing  an 
odd  number  of  l’s).  Show  that  L  is  in  D. 
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b.  Let  L  =  {<£>:£  is  a  regular  expression  that  describes  a  language  that  con¬ 
tains  at  least  one  string  w  that  contains  111  as  a  substring}.  Show  that  L  is  in  D. 

c.  Consider  the  problem  of  testing  whether  a  DFSM  and  a  regular  expression 
are  equivalent.  Express  this  problem  as  a  language  and  show  that  it  is  in  D. 

2.  Consider  the  language  L  =  {w  =  xy  :x,y  e  {a.  b}*  and y  is  identical  to x  except 
that  each  character  is  duplicated}.  For  example,  ababaabbaabb  €  L. 

a.  Show  that  L  is  not  context-free. 

b.  Show  a  Post  system  (as  defined  in  Section  18.2.4)  that  generates  L. 

3.  Show  a  Post  system  that  generates  AnBnCn. 

4.  Show  a  Markov  algorithm  (as  defined  in  Section  18.2.6)  to  subtract  two  unary 
numbers.  For  example,  on  input  111-1,  it  should  halt  with  the  string  11.  On  input 
1-111,  it  should  halt  with  the  string  -11. 

5.  Show  a  Markov  algorithm  to  decide  WW. 

6.  Consider  Conway’s  Game  of  Life,  as  described  in  Section  18.2.7.  Draw  an  exam¬ 
ple  of  a  simple  Life  initial  configuration  that  is  an  oscillator,  meaning  that  it 
changes  from  step  to  step  but  it  eventually  repeats  a  previous  configuration. 


CHAPTER  19 


The  Unsolvability  of  the  Halting 
Problem 


So  far,  we  have  focused  on  solvable  problems  and  we  have  described  an  increas¬ 
ingly  powerful  sequence  of  formal  models  for  computing  devices  that  can  imple¬ 
ment  solutions  to  those  problems.  Our  last  attempt  is  the  Turing  machine  and 
we've  shown  how  to  use  Turing  machines  to  solve  several  of  the  problems  that  were 
not  solvable  with  a  PDA  or  an  FSM.The  Church-Turing  thesis  suggests  that,  although 
there  are  alternatives  to  Turing  machines,  none  of  them  is  any  more  powerful.  So, 
are  we  done?  Can  we  build  a  Turing  machine  to  solve  any  problem  we  can  formally 
describe? 

Until  a  bit  before  the  middle  of  the  20,h  century,  western  mathematicians  believed 
that  it  would  eventually  be  possible  to  prove  any  true  mathematical  statement  and  to 
define  an  algorithm  to  solve  any  clearly  stated  mathematical  problem.  Had  they  been 
right,  our  work  would  be  done.  But  they  were  wrong.  And.  as  a  consequence,  the  an¬ 
swer  to  the  question  in  the  last  paragraph  is  no.  There  are  well-defined  problems  for 
which  no  Turing  machine  exists. 

In  this  chapter  we  will  prove  our  first  result  that  shows  the  limits  of  what  we  can 
compute.  In  later  chapters,  we  will  discuss  other  unsolvable  problems  and  we  will  see 
how  to  analyze  new  problems  and  then  prove  either  that  they  arc  solvable  or  that  they 
are  not.  We  will  do  this  by  showing  that  there  are  languages  that  are  not  decidable  (i.e., 
they  are  not  in  D).  So,  recall  the  definitions  of  the  sets  D  and  SD  that  we  presented  in 
Chapter  17: 

•  A  Turing  machine  M  with  input  alphabet  2  decides  a  language  LQl*  (or,  alterna¬ 
tively,  implements  a  decision  procedure  for  L)  iff.  for  any  string  we  2*: 

•  if  w  e  L  then  M  accepts  w,  and 
•  if  w$L  then  M  rejects  tv. 

A  language  L  is  decidable  (and  thus  an  element  of  D)  iff  there  is  aThring  machine 
M  that  decides  it. 
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•  A  'Hiring  machine  M  with  input  alphabet  2  semidecides  a  language  L  C  2*  (or, 
alternatively,  implements  a  semidecision  procedure  for  L)  iff  for  any  string  w  e  2*: 

•  if  we  L  then  M  accepts  w, and 

•  if  w  e  L  then  M  does  not  accept  w.  (Note  that  M  may  fail  to  accept  either  by 
rejecting  or  by  failing  to  halt.) 

A  language  L  is  semidecidable  (and  thus  an  element  of  SD)  iff  there  is  a  Hiring 
machine  that  semidecides  it. 

Many  of  the  languages  that  we  are  about  to  consider  are  composed  of  strings  that 
correspond,  at  least  in  part,  to  encodings  of  Turing  machines.  Some  of  them  may  also 
contain  other  fragments.  So  we  will  be  considering  languages  such  as: 

•  L\  =  {<M,  ru>  :  Turing  machine  M  halts  on  input  string  w/}. 

•  L2  =  {  <M >  :  there  exists  no  string  on  which  Hiring  machine  M  halts}. 

•  L3  =  {<Ma,  Mb>  :  Afa  and  Mb  are  Turing  machines  that  halt  on  the  same  strings}. 

Recall  that  <M>  is  the  notation  that  we  use  for  the  encoding  of  a  Hiring  machine 
M  using  the  scheme  described  in  Section  17.6.  <M,  w>  means  the  encoding  of  a  pair 
of  inputs:  a  Hiring  machine  M  and  an  input  string  w.  <Af,,  Mb>  means  the  encoding 
of  a  pair  of  inputs,  both  of  which  are  Hiring  machines. 

Consider  L\  above.  It  consists  of  the  set  of  strings  that  encode  a  (Turing  machine, 
string)  pair  with  the  property  that  the  Hiring  machine  M.  when  started  with  w  on  its 
tape,  halls.  So,  in  order  for  some  string  s  to  be  in  language  L\,  it  must  possess  two 
properties: 

•  It  must  be  syntactically  well-formed. 

•  It  must  encode  a  machine  M  and  a  string  w  such  that  M  would  halt  if  started  on  w . 

We  will  be  attempting  to  find  Turing  machines  that  can  decide  (or  semidecide) 
languages  like  Lx,  L2,  and  L3.  Building  a  luring  machine  to  check  for  syntactic  valid¬ 
ity  is  easy.  We  would  like  to  focus  on  the  other  part.  So,  in  our  discussion  of  languages 
such  as  these,  we  will  define  the  universe  from  which  we  are  drawing  strings  to  be  the 
set  that  contains  only  those  strings  that  meet  the  syntactic  requirements  of  the  lan¬ 
guage  definition.  For  example,  that  could  be  the  set  that  contains  descriptions  of  Hir¬ 
ing  machines  (strings  of  the  form  <A/>),  or  the  set  that  contains  descriptions  of  a 
Turing  machine  and  a  string  (strings  of  the  form  <M,  iu>).  This  contrasts  with  the 
convention  we  have  been  using  up  until  now,  in  which  the  universe  was  2*,  where  2 
is  the  alphabet  over  which  L  is  defined. 

This  change  in  convention  will  be  important  whenever  we  talk  about  the  comple¬ 
ment  of  a  language  such  as  Lx,  L2,  or  L3.  So,  for  example,  we  have: 

->L\  —  {<M,  w>  :  Turing  machine  M  does  not  halt  on  input  string  w). 

Note  that  this  convention  has  no  impact  on  the  decidability  of  any  of  these  lan¬ 
guages  since  the  set  of  syntactically  valid  strings  is  in  D.  So  it  is  straightforward  to  build 
a  precondition  checker  that  accepts  exactly  the  syntactically  well-formed  strings  and 
rejects  all  others. 
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19.1  The  Language  H  is  Semidecidable  but  Not  Decidable 

We  begin  by  considering  the  language  we  called  l.\  in  ihe  Iasi  section.  We're  now  going 
lo  call  il  H,  Ihe  halting  problem  language.  So.  define: 

•  H  =  [<M.  w>  :  Turing  machine  M  halts  on  input  string  m’}. 

H  is: 

•  Easy  to  state  and  to  understand. 

•  Of  great  practical  importance  since  a  program  to  decide  H  could  be  a  very  useful 
part  of  a  program-correctness  checker.  You  don’t  want  to  go  online  to  pay  a  bill  and 
have  the  system  go  into  an  infinite  loop  after  it  has  debited  your  bank  account  and 
before  it  credits  the  payment  to  your  electric  bill. 

•  Semidecidable. 

•  Not  decidable. 

We  need  to  prove  these  last  two  claims.  Before  we  attempt  to  do  that,  let's  consider 
them.  H  would  be  decidable  if  there  existed  an  algorithm  that  could  take  as  input  a 
program  M  and  an  input  w  and  decide  whether  M  will  halt  on  w.  It  is  easy  to  define 
such  an  algorithm  that  works  some  of  the  time.  For  example,  it  would  be  easy  to  design 
an  algorithm  that  could  discover  that  the  following  program  (and  many  others  like  it 
that  contain  no  loops)  halts  on  all  inputs: 

1.  Concatenate  0  to  the  end  of  the  input  string. 

2.  Halt. 

It  would  also  be  easy  lo  design  an  algorithm  that  could  discover  that  the  following 
program  (and  many  others  like  it)  halts  on  no  inputs: 

1.  Concatenate  0  lo  the  end  of  the  input  string. 

2.  Move  right  one  square. 

3.  Go  to  step  1. 

But,  for  H  to  be  decidable,  we  would  need  an  algorithm  that  decides  the  question  in 
all  cases.  Consider  the  following  program: 

times3(x:  positive  integer)  = 

While  x  *  1  do: 

If  x  is  even  then  jr  =  xl2. 

Else  .r  *  3x  +  1 . 

It  is  easy  to  prove  that  Iimes3  halls  on  any  positive  integer  that  is  a  power  of  2.  In 
that  case,.r  decreases  each  time  through  the  loop  and  must  eventually  hit  1.  But  what 
about  other  inputs?  Will  it  halt,  for  example  on  23.478?  It  is  conjectured  that,  for  any 
positive  integer  input,  the  answer  to  this  question  is  yes.  But.  so  far,  no  one  has  been 
able  either  lo  prove  that  conjecture  or  to  find  a  counterexample.  Ihe  problem  of  deter¬ 
mining  whether  times3  must  always  halt  is  called  the  Jjr  +  /  problem  H. 
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So  there  appear  to  be  programs  whose  halting  behavior  is  difficult  to  determine.  We 
now  prove  that  the  problem  of  deciding  halting  behavior  for  an  arbitrary  (machine, 
input)  pair  is  semideddable  but  not  decidable. 

THEOREM  19.1  Semidecidability  of  the  Halting  Problem 

Theorem:  The  language  H  =  {<M,  w>  :  Turing  machine  M  halts  on  input  string  w} 
is  semideddable. 

Proof:  The  proof  is  by  construction  of  a  semideciding Hiring  machine  Ms h-  The  design 
of  A/SH  is  simple.  All  it  has  to  do  is  to  run  M  on  w  and  accept  if  M  halts.  So: 

Msh(<M,w>)  = 

1.  Run  M  on  w. 

2.  Accept. 

Msh  accepts  iff  M  halts  on  it;.  Thus  MSH  semidecides  H. 

But  H  is  not  decidable.  This  single  fact  is  going  to  turn  out  to  be  the  cornerstone  of 
the  entire  theory  of  undecidability  that  we  will  discuss  in  the  next  several  chapters. 


Compilers  check  for  various  kinds  of  errors  in  programs.  But,  because  H  is 
undecidable,  no  compiler  can  offer  a  guarantee  that  a  program  is  free  of  infi¬ 
nite  loops.  (G.4.4) 


THEOREM  19.2  Undecidability  of  the  Halting  Problem _ 

Theorem:  The  language  H  =  {<M,w>  '.Turing  machine  Mhalts  on  input  string  it;} 
is  not  decidable. 

Proof:  If  H  were  decidable,  then  there  would  be  some  Hiring  machine  AfH  that 
decided  it.  MH  would  implement  the  following  specification: 

halts(<M :  string,  w  :  string>)  = 

If  <M>  is  the  description  of  a  Hiring  machine  that  halts  on  input  w,  then 
accept;  else  reject. 

Note  that  we  have  said  nothing  about  how  AfH  would  work.  It  might  use 
simulation.  It  might  examine  Af  for  loops.  It  might  use  a  crystal  ball.  The  only 
claim  we  are  making  about  AfH  is  that  it  can  implement  halts.  In  other  words,  it 
can  decide  somehow  whether  M  halts  on  w  and  report  True  if  it  does  and  False 
if  it  does  not. 
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Now  suppose  that  we  write  the  specification  for  a  second  TUring  machine, 
which  we’ll  call  Trouble: 

Trouhle(x:  string)  = 

If  halts  accepts  <.v..v>,  then  loop  forever:  else  halt. 

If  there  exists  some  that  computes  the  function  halts,  then  the  TUring 
machine  Trouble  also  exists.  We  can  easily  write  the  code  for  it  as  follows:  Assume 
that  C9  is  a  Turing  machine  (similar  to  the  copy  machine  that  we  showed  in 
Example  17.11)  that  writes  onto  its  tape  a  second  copy  of  its  input,  separated 
from  the  first  by  a  comma.  Also  assume  that  Mu  exploits  the  variable  r,  into 
which  it  puts  1  if  it  is  about  to  halt  and  accept  and  0  if  it  is  about  to  halt  and 
reject.  Then,  using  the  notation  defined  in  Section  17.1.5,  Trouble  is  shown  in 
Figure  19.1. 

Trouble  lakes  a  single  siring  x  as  its  input.  It  makes  a  copy  of  that  string, moves 
its  read/write  head  all  the  way  back  to  the  left,  and  then  invokes  M\\  on.r,jr.  Mh 
will  treat  the  first  copy  as  a  Turing  machine  and  the  second  one  as  the  input  to 
that  Turing  machine.  When  M\\  halts  (which  it  must,  since  we’ve  assumed  that 
it  is  a  deciding  machine).  Trouble  will  either  halt  immediately  or  loop  forever, 
depending  on  whether  Mlt  stored  a  0  or  a  I  in  r. 

What  happens  if  we  now  invoke  Trouhle(  <  Trouble>)'l  In  other  words,  we  in¬ 
voke  Trouble  on  the  string  that  corresponds  to  its  ow  n  description,  as  shown  in 
the  figure. Then  Trouble  will  invoke  M\\(<Trouhle,  Trouble>).  Since  the  second 
argument  of  AfH  can  be  any  string,  this  is  a  valid  invocation  of  the  function.  What 
should  Mh  say? 

•  If  AfH  reports  that  Trouble ( <  Trouble> )  halts  (by  putting  a  1  in  the  variable  r), 

then  what  Trouble  actually  docs  is  to  loop. 

•  But  if  A#h  reports  that  Trouble(<Trouble>)  does  not  hall  (by  putting  a  0  in 

the  variable  r),lhcn  what  Trouble  actually  does  is  to  hall. 

Thus  there  is  no  response  that  can  make  that  accurately  predicts  the 
behavior  of  Trouble(<Trouble>).  So  we  have  found  at  least  one  input  on  which 
any  implementation  of  halts  must  fail  to  report  the  correct  answer.  Thus  there 
exists  no  correct  implementation  of  /?«//$.  This  means  that  /V/H  does  not  exist.  So 
H  is  not  decidable. 
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FKiUKK  19.1  A  Turing  machine 
that  implements  the  function  Trouble. 
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Table  19.1  Using  diagonalization  to  construct  Trouble. 
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There  is  another  way  to  state  this  proof  that  makes  it  clearer  that  what  we  have  just 
done  is  to  use  diagonalization.  Consider  Table  19.1.  To  form  column  0,  we  lexicographi¬ 
cally  enumerate  all  Turing  machines,  using  the  procedure  that  was  defined  in  Section 
17.6.2.  To  form  row  0,  we  lexicographically  enumerate  all  possible  input  strings  over  the 
alphabet  2  that  we  used  to  encode  inputs  to  the  universal  Turing  machine.  The  cell  [i,j] 
of  the  table  contains  the  value  1  if  TM(  halts  on  the  /h  input  string  and  is  blank  otherwise. 

This  table  is  infinite  in  both  directions,  so  it  will  never  be  explicitly  constructed.  But, 
if  we  claim  that  the  Turing  machine  MH  exists,  we  are  claiming  that  it  can  compute  the 
correct  value  for  any  cell  in  this  table  on  demand.  Trouble  must  correspond  to  some 
row  in  the  table  and  so,  in  particular,  MH  must  be  able  to  compute  the  values  for  that 
row. The  string  <Trouble>  must  correspond  to  some  column  in  the  table.  What  value 
should  occur  in  the  black  cell  of  the  picture?  There  is  no  value  that  correctly  describes 
the  behavior  of  Trouble ,  since  we  explicitly  constructed  it  to  look  at  the  black  cell  and 
then  do  exactly  the  opposite  of  what  that  cell  says. 

So  we  have  just  proven  (twice)  a  very  important  result  that  can  be  stated  in  any  one 
of  three  ways: 

•  The  language  H  is  not  decidable. 

•  The  halting  problem  is  unsolvable  (i.e.,  there  can  exist  no  implementation  of  the 
specification  we  have  given  for  the  halts  function). 

•  The  membership  problem  for  the  SD  languages  (i.e.,  those  that  can  be  accepted  by 
some  Tbring  machine)  is  not  solvable. 

Recall  that  we  have  seen  many  times  that  any  decision  problem  that  we  can  state 
formally  can  be  restated  as  a  language  recognition  task.  So  it  comes  as  no  surprise  that 
this  one  can.  In  the  rest  of  this  book,  we  will  use  whichever  version  of  this  result  is 
clearer  at  each  point. 

19.2  Some  Implications  of  the  Undecidability  of  H 

We  now  have  our  first  example,  H,  of  a  language  that  is  semidecidable  (i.e.,  it  is  in  SD) 
but  that  is  not  decidable  (i.e.,  it  is  not  in  D).  What  we  will  see  in  the  rest  of  this  section 
is  that  H  is  far  more  than  an  anomaly.  It  is  the  key  to  the  fundamental  distinction 
between  the  classes  D  and  SD. 
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THEOREM  19.3  H  is  the  Key  to  the  Difference  Between  D  and  SD  _ _ 

Theorem:  If  H  were  in  D  then  every  SD  language  would  he  in  D. 

Proof:  Let  L  be  any  SD  language.  Since  /.  is  in  SD.  there  exists  a  Turing  machine 
Mi  that  semidecides  it.  Suppose  H  were  also  in  IX  Then  it  would  be  decided 
by  some  Turing  machine  that  we  can  call  O  (lor  oracle). To  decide  whether 
some  string  ir  is  in  /..  we  can  appeal  to  ()  and  ask  it  whether  M/.  W’ill  halt  on 
the  input  w\  If  the  answer  is  yes.  we  can  (without  risk  of  getting  into  an  infi¬ 
nite  loop)  run  M/  on  »r  and  see  whether  or  not  it  accepts.  So.  given  Af/.  (the 
machine  that  semidecides  L).  we  can  build  a  new  luring  machine  M'  that 
decides  L  by  appeal  to  O: 

M'{uk  string)  = 

1.  Run  O  on  <AfL.  tr>. 

2.  If  O  accepts  (which  it  will  iff  halts  on  ir  ).  then: 

2.1.  Run  /VfL  on  w. 

2.2.  If  it  accepts,  accept.  Lise  reject. 

3.  Else  reject. 

Since  O  is  a  deciding  machine  for  H.  it  always  halls.  If  it  reports  that  M  would 
halt  on  u\  then  M'  can  run  M  on  w  to  see  whether  it  accepts  or  rejects.  If,  on  the 
other  hand.  O  reports  that  M  would  not  halt  then  it  certainly  cannot  accept,  so  Af 
rejects.  So  M'  always  halls  and  returns  the  correct  answer. Thus,  if  H  were  in  D,  all 
SD  languages  would  be. 

But  H  is  not  in  D.  And  as  we  are  about  to  see.  it  is  not  alone. 

19.3  Back  to  Turing,  Church,  and  the 
Entscheidungsproblem 

At  the  beginning  of  Chapter  IK.  we  mentioned  that  luring  invented  the  Turing 
machine  because  he  was  attempting  to  answer  the  question.  "Given  a  set  of  ax¬ 
ioms  A  and  a  sentence  v,  does  there  exist  an  algorithm  to  decide  whether  s  is  en¬ 
tailed  by  A?” To  do  that,  he  needed  a  formal  definition  of  an  algorithm,  which  the 
Turing  machine  provided.  As  an  historical  aside,  we  point  out  here  that  in  Turing’s 
model,  machines  (with  the  exception  of  a  universal  machine  that  could  simulate 
other  machines)  were  always  started  on  a  blank  tape.  So.  while  in  //  we  ask 
whether  a  Turing  machine  M  halts  on  some  particular  input  n\  luring  would  ask 
simply  whether  it  halls.  But  note  that  this  is  not  a  signilicant  change.  In  our  model, 
all  inputs  are  of  finite  length.  So  it  is  possible  to  encode  anv  particular  input  in  the 
stales  of  a  machine  that  is  to  operate  on  it. That  machine  can  start  out  with  a  blank 
tape,  write  the  desired  input  on  its  tape,  and  then  continue  as  though  the  tape  had 
contained  the  input. 
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Having  defined  the  Turing  machine  (which  he  called  simply  a  “’computing  machine*’), 
Turing  went  on  to  show  the  unsolvability  of  the  halting  problem.  He  then  used  that 
result  to  show  that  no  solution  to  the  Entscheidungsproblem  (the  problem  of  deciding 
whether  s  is  entailed  by  A )  can  exist.  An  outline  of  Turing's  proof  is  the  following: 

1.  If  we  could  solve  the  problem  of  determining  whether  a  given  Turing  machine 
ever  prints  the  symbol  0.  then  we  could  solve  the  problem  of  determining 
whether  a  given  Turing  machine  halts.  Turing  presented  the  technique  by  which 
this  could  be  done. 

2.  But  we  can’t  solve  the  problem  of  determining  whether  a  given  Turing  machine 
halts,  so  neither  can  we  solve  the  problem  of  determining  whether  it  ever  prints  0. 

3.  Given  a  Turing  machine  Af.we  can  construct  a  logical  formula  Fthat  is  a  theorem, 
given  the  axioms  of  Peano  arithmetic,  iff  M  ever  prints  the  symbol  0.  Turing  also 
presented  the  technique  by  which  this  could  be  done. 

4.  If  there  were  a  solution  to  the  Entscheidungsproblem,  then  we  would  be  able  to 
determine  the  theoremhood  of  any  logical  sentence  and  so,  in  particular,  we  could 
use  it  to  determine  whether  F  is  a  theorem.  We  would  thus  be  able  to  decide 
whether  M  ever  prints  the  symbol  0. 

5.  But  we  know  that  there  is  no  procedure  for  determining  whether  M  ever  prints  0. 

6.  So  there  is  no  solution  to  the  Entscheidungsproblem. 

This  proof  is  an  example  of  the  technique  that  we  will  use  extensively  in 
Chapter  21  to  show  that  problems  are  not  decidable.  We  reduce  a  problem  that  is 
already  known  not  to  be  decidable  to  a  new  problem  whose  decidability  is  in 
question.  In  other  words,  we  show  that  if  the  new  problem  were  decidable  by 
some  luring  machine  M.  then  we  could  use  M  as  the  basis  for  a  procedure  to  de¬ 
cide  the  old  problem.  But,  since  we  already  know  that  no  solution  to  the  old 
problem  can  exist,  no  solution  for  the  new  one  can  exist  either. The  proof  we  just 
sketched  uses  this  technique  twice:  once  in  steps  1  and  2  to  show  that  we  cannot 
solve  the  problem  of  determining  whether  a  Turing  machine  ever  prints  the  sym¬ 
bol  0.  and  a  second  time,  in  steps  3  through  6,  to  show  that  we  cannot  solve  the 
Entscheidungsproblem. 


Exercises 

1.  Consider  the  language  L  —  [<M>  : Turing  machine  M  accepts  at  least  two 
strings). 

a.  Describe  in  clear  English  a  Turing  machine  M  that  semidecides  L. 

b.  Now  change  the  definition  of  L  just  a  bit.  Consider: 

L  ~  [<M>  :  Turing  machine  M  accepts  exactly  2  strings>. 

Can  you  tweak  the  Turing  machine  you  described  in  part  a  to  semidecide  L ’? 
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2.  Consider  the  language  L  =  {<A/>: Turing  machine  M  accepts  the  binary 
encodings  of  the  first  three  prime  numbers}. 

a.  Describe  in  clear  English  a  Turing  machine  M  that  semidecides  L. 

b.  Suppose  (contrary  to  fact,  as  established  by  Theorem  14.2)  that  there  were  a 
Turing  machine  Oracle  that  decided  H.  Using  it.  describe  in  clear  English  a 
Turing  machine  M  that  decides  L. 


CHAPTER  20 


Decidable  and  Semidecidable 
Languages 

Now  lhat  we  have  shown  that  the  halting  problem  is  undecidable.  it  should  be 
clear  why  we  introduced  the  notion  of  a  semidecision  procedure.  For  some 
problems,  it  is  the  best  we  will  be  able  to  come  up  with.  In  this  chapter  we 
explore  the  relationship  between  the  classes  D  and  SD,  given  what  we  now  know  about 
the  limits  of  computation. 


20.1  D:  The  Big  Picture 

First,  we  observe  that  the  class  D  includes  the  regular  and  the  context-free  languages. 

More  precisely: 

THEOREM  20.1  All  Context-Free  Languages,  Plus  Others,  are  in  D 
Theorem:  The  set  of  context-free  languages  is  a  proper  subset  of  D. 

» 

Proof:  By  Theorem  14.1,  the  membership  problem  for  the  context-free  languages  is 
decidable.  So  the  context-free  languages  are  a  subset  of  D.  And  there  is  at  least 
one  language.  A"B"Cn,  that  is  decidable  but  not  context-free.  So  the  context-free 
languages  arc  a  proper  subset  of  D. 


20.2  SD:  The  Big  Picture 

Now  what  can  wc  say  about  the  relationship  between  D  and  the  larger  class  SD?  Almost 
every  language  you  can  think  of  that  is  in  SD  is  also  in  D.  Examples  include: 

•  A"B"C"  =  {a¥c":n  >  0}, 

•  WcW  =  {wcw.we  {a.b}*}, 
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•  WW  =  {mw  :  we  {a,  b}*}.  and 

•  { w  of  the  form:  x*y  =  z,  where:  .v.  y.  z  e  (0.  1}*  and,  when  x ,  v,  and  z  are 
viewed  as  binary  numbers, x-y  -  z}. 

But  there  are  languages  that  are  in  SD  but  not  in  D.  We  already  know  one: 

•  H  =  {<M,  w>  :  Turing  machine  M  halts  on  input  siring  iv\. 

What  about  others?  It  isn’t  possible  to  come  up  with  any  physical  examples  since 
there  are  only  finitely  many  molecules  in  the  observable  universe.  So  every  physical  set 
is  finite  and  thus  regular.  But  unless  we  want  to  model  all  our  real  world  problems 
using  only  the  power  of  a  finite  state  machine,  we  generally  ignore  the  fact  that  the  true 
language  is  finite  and  model  it  as  a  more  complex  set  that  is  unbounded  and  thus. for  all 
practical  purposes,  infinite.  If  we  do  that,  then  here's  a  language  that  is  effectively  in 
SD  and  has  the  look  and  feel  of  many  SD  languages: 

•  L  —  { w :  w  is  the  email  address  of  someone  who  will  respond  to  a  message  you  just 
posted  to  your  newsgroup}. 

If  someone  responds,  you  know  that  their  email  address  is  in  L.  But  if  your  best  friend 
hasn’t  responded  yet.  you  don’t  know  that  she  isn't  going  it).  All  you  can  do  is  wait. 

In  Chapter  21  we  will  see  that  any  question  that  asks  about  the  result  of  running  a 
Turing  machine  is  undecidable  (and  so  its  corresponding  language  formulation  is  not 
in  D).  In  a  nutshell,  if  you  can’t  think  of  a  way  to  answer  the  question  by  simulating 
the  Turing  machine,  it  is  very  likely  that  there  is  no  other  way  to  do  it  and  the  ques¬ 
tion  is  undecidable.  But  keep  in  mind  that  we  said  that  the  question  must  ask  about 
the  result  of  running  the  Turing  machine.  Questions  that  ask  simply  about  the  "Hiring 
machine  itself  (e.g.,  how  many  stales  does  it  have)  or  about  its  behavior  partway 
through  its  computation  (e.g.,  what  does  it  do  after  exactly  100  steps)  are  generally 
decidable. 

In  Chapter  22  we  will  see  some  examples  of  undecidable  problems  that  do  not  ask 
questions  about  Turing  machines.  If  you'd  like  to  be  convinced  that  this  theory  applies 
to  more  than  the  analysis  of  Turing  machines  (or  of  programs  in  general),  skip  ahead 
briefly  to  Chapter  22. 

In  this  chapter  we  will  look  at  properties  of  four  classes  of  languages  and  see  how  they 
relate  to  each  other. The  classes  we  will  consider  are  shown  in  Figure  20.1.  They  are: 

•  D,  corresponding  to  the  inner  circle  of  the  figure,  the  set  of  decidable  languages. 

•  SD,  corresponding  to  the  outer  circle  of  the  figure,  the  set  of  semidecidable  languages. 

•  SD/D,  corresponding  to  the  donut  in  the  figure,  the  set  of  languages  that  are  in  SD 
but  not  D. 

•  ->SD,  corresponding  to  the  grey  area  in  the  figure,  the  set  of  languages  that  are  not 
even  semidecidable. 
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FIGURE  20.1  The  relationships  between  D  and  SD. 


20.3  Subset  Relationships  between  D  and  SD 

The  picture  that  we  just  considered  implicitly  makes  three  claims  about  the  relationship 
between  the  classes  D  and  SD.  From  the  inside  out  they  are: 

1.  D  is  a  subset  of  SD.  In  other  words,  every  decidable  language  is  also  semidecidable. 

2.  There  exists  at  least  one  language  that  is  in  SD  but  not  D  and  so  the  donut  in  the 
picture  is  not  empty. 

3.  There  exist  languages  that  are  not  in  SD.  In  other  words,  the  gray  area  of  the  figure 
is  not  empty. 

We  have  already  proven  the  second  of  these  claims:  In  Chapter  19  we  described 
H  =  { <M,  w>  :  Turing  machine  M  halts  on  input  string  w}  and  showed  that  H  is  not 
in  D  but  is  in  SD.  We  now  consider  each  of  the  other  two  claims. 


THEOREM  20.2  D  is  a  Subset  of  SD 

Theorem:  Every  decidable  language  is  also  semidecidable. 

Proof:  The  proof  follows  directly  from  the  definitions  of  deciding  and  semideciding 
Turing  machines.  If  L  is  in  D,  then  it  is  decided  by  some  Turing  machine  M. 
M  therefore  accepts  all  and  only  the  strings  in  L.  So  M  is  also  a  semideciding 
machine  for  L.  Since  there  is  a  Turing  machine  that  semidecides  L,it  is  in  SD. 


Next  we  consider  whether  the  class  SD  includes  all  languages  or  whether  there  are 
languages  that  are  not  even  semidecidable.  As  Figure  20.1  suggests  (by  the  existence  of 
the  gray  region),  the  answer  is  thaUhere  are  languages  that  are  not  in  SD. 
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THEOREM  20.3  Not  All  Languages  are  in  SD 

Theorem:  There  exist  languages  that  are  not  in  SD. 

Proof:  We  will  use  a  counting  argument.  Assume  any  nonempty  alphabet  2.  First 
we  prove  the  following  lemma: 

Lemma:  There  is  a  countably  infinite  number  of  SD  languages  over  2. 

Proof  of  Lemma:  Every  semidecidable  language  is  semidccided  by  some  Tiring 
machine.  We  can  lexicographically  enumerate  all  the  syntactically  legal  Tiring 
machines  with  input  alphabet  2.  That  enumeration  is  infinite,  so,  by  Theorem 
A.l.  there  is  a  countably  infinite  number  of  semideciding Turing  machines,  [here 
cannot  be  more  SD  languages  than  there  are  semideciding  Turing  machines,  so 
there  is  at  most  a  countably  infinite  number  of  SD  languages. There  is  not  a  one- 
to-one  correspondence  between  SD  languages  and  semideciding  Turing  ma¬ 
chines  since  there  is  an  infinite  number  of  machines  that  semidecide  any  given 
language.  But  the  number  of  SD  languages  must  be  infinite  because  it  includes 
(by  Theorem  20.1  and  Theorem  20.2)  all  the  context-free  languages  and,  by 
Theorem  13.2,  there  are  an  infinite  number  of  them.  So  there  is  a  countably  infinite 
number  of  SD  languages. 

Proof  of  Theorem:  There  is  an  uncouniablv  infinite  number  of  languages  over  2 
(by  Theorem  2.2).  So  there  are  more  languages  over  2  than  there  are  in  SD.Thus 
there  must  exist  at  least  one  language  that  is  in  -’SD. 


We  will  see  our  first  example  of  a  language  that  is  in  ->SD  in  the  next  section. 

20.4  The  Classes  D  and  SD  Under  Complement 

The  regular  languages  are  closed  under  complement.  The  context  free  languages  are 
not.  What  about  the  decidable  (D)  languages  and  the  semidecidable  (SD)  languages? 

THEOREM  20.4  The  Decidable  Languages  are  Closed  Under  Complement 
Theorem:  The  class  D  is  closed  under  complement. 

Proof:  The  proof  is  by  a  construction  that  is  analogous  to  the  one  we  used  to  show 
that  the  regular  languages  are  closed  under  complement.  Let  /_  be  any  decidable 
language.  Since  L  is  in  D.  there  is  some  deterministic  Turing  machine  M  that  decides 
it.  Recall  that  a  deterministic  Turing  machine  must  he  completely  specified 
(i.e.,  there  must  be  a  transition  from  every  nonhalting  state  on  every  character  in 
the  tape  alphabet),  so  there  is  no  need  to  worry  about  a  dead  state.  From  M  we 
construct  M'  to  decide  ~^L.  Initially,  let  M'  =  M.  Now  swap  the  y  and  n  states. 
M'  halts  and  accepts  whenever  M  would  halt  and  reject;  XT  halts  and  rejects 
whenever  M  would  halt  and  accept.  Since  M  always  halts,  so  does  XT.  And  Xi' 
accepts  exactly  those  strings  that  XI  would  reject,  i.c*.  Since  there  is  a  deciding 
machine  for  -« L.  it  is  in  D.  ° 
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THEOREM  20.5  The  Semidecidable  Languages  are  not  Closed  Under 
Complement 

Theorem:  The  class  SD  is  not  closed  under  complement. 

Proof:  The  proof  is  by  contradiction.  Suppose  the  class  SD  were  closed  under  com¬ 
plement.  Then,  given  any  language  L  in  SD,  ->L  would  also  be  in  SD.  So  there 
would  be  a  Turing  machine  Af  that  semidecides  L  and  another  Hiring  machine 
Af '  that  semidecides  ->L.  From  those  two  we  could  construct  a  new  a  Hiring  machine 
Af#  that  decides  L.  On  input  w,  Af#  will  simulate  Af  and  Af',  in  parallel,  running 
on  w.  Since  w  must  be  an  element  of  either  L  or  L,  one  of  Af  or  Af '  must  even¬ 
tually  accept.  If  Af  accepts,  then  Af#  halts  and  accepts.  If  Af'  accepts,  then  Af# 
halts  and  rejects.  So,  if  the  SD  languages  were  closed  under  complement,  then  all 
SD  languages  would  also  be  in  D.  But  we  know  from  Chapter  19  that 
H  =  (<Af,  w>  :  Turing  machine  Af  halts  on  input  string  w}  is  in  SD  but  not  D. 


These  last  two  theorems  give  us  a  new  way  to  prove  that  a  language  L  is  in  D  (or,  in 
fact,  a  way  to  prove  that  a  language  is  not  in  SD): 

THEOREM  20.6  L  and  -iL  Both  in  SD  is  Equivalent  to  L  is  in  D 

Theorem:  A  language  L  is  in  D  iff  both  it  and  its  complement  ~>L  are  in  SD. 

Proof:  We  prove  each  direction  of  the  implication: 

Proof  that  L  in  D  implies  L  and  -^L  are  in  SD:  Because  L  is  in  D,  it  must  also  be  in  SD 
by  Theorem  20.2.  But  what  about  ->L?  By  Theorem  20.4,  the  class  D  is  closed  under 
complement,  so  -> L  is  also  in  D.  And  so,  using  Theorem  20.2  again,  it  is  also  in  SD. 

Proof  that  L  and  -*L  are  in  SD  implies  L  is  in  D;  The  proof  is  by  construction  and 
uses  the  same  construction  that  we  used  to  prove  Theorem  20.5:  Since  L  and  ->L 
are  in  SD,  they  each  have  a  semideciding  Hiring  machine.  Suppose  L  is  semi- 
decided  by  Af  and  -i L  is  semidecided  by  Af'.  From  those  two  we  construct  a  new 
Hiring  machine  A/#  that  decides  L.  On  input  w ,  Af#  will  simulate  Af  and  Af in 
parallel,  running  on  w.  Since  w  must  be  an  element  of  either  L  or  ->L,  one  of  Af  j 
or  Af2  must  eventually  accept.  If  Af,  accepts,  then  Af#  halts  and  accepts.  If  Af2  ac¬ 
cepts,  then  Af#  halts  and  rejects.  Since  Af#  decides  L,  L  is  in  D. 


We  can  use  Theorem  20.6  to  prove  our  first  example  of  a  language  that  is  not  in  SD: 
THEOREM  20.7  -«H  is  not  in  SD 


Theorem:  Hie  language  -^H,  (the  complement  of  H)  =  {<Af ,  w>  :  Turing  machine 
Af  does  not  halt  on  input  string  w}  is  not  in  SD. 

Proof:  Recall  that  we  are  defining  the  complement  of  languages  involving  Turing 
machine  descriptions  with  respect  to  the  universe  of  syntactically  well-formed 
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strings.  From  Theorem  19.1,  we  know  that  H  is  in  SD  (since  we  showed  a  semi' 
deciding  Turing  machine  for  it).  By  Theorem  20.6  we  know  that  if  -’H  were  also  in 
SD  then  H  would  be  in  D.  But.  by  Theorem  19.2.  we  know  that  H  is  not  in  D.  So 
iH  is  not  in  SD. 


20.5  Enumerating  a  Language 

In  most  of  our  discussion  so  far.  we  have  defined  a  language  by  specifying  either  a 
grammar  that  can  generate  it  or  a  machine  that  can  accept  it.  But  it  is  also  possible  to 
specify  a  machine  that  is  a  generator.  Its  job  is  to  enumerate  (in  some  order)  the  strings 
of  the  language.  We  will  now  explore  how  to  use  a  Turing  machine  to  do  that. 


20.5.1  Enumerating  in  Some  Undefined  Order 

To  generate  a  language  L.  we  need  a  Turing  machine  M  whose  job  is  to  start  with  a 
blank  tape,  compute  for  a  while,  place  some  string  in  L  on  the  tape,  signal  that  we 
should  snapshot  the  tape  to  record  its  contents,  and  then  go  back  and  do  it  all  that 
again.  If  L  is  finite,  we  can  construct  M  so  that  it  will  eventually  halt.  If  L  is  infinite.  A# 
must  continue  generating  forever.  If  a  Turing  machine  M  behaves  in  this  way  and  out¬ 
puts  all  and  only  the  strings  in  L,  then  we  say  that  M  enumerates  L.  Any  enumerating 
Turing  machine  M  must  have  a  special  state  that  we  will  call  />  (for  print).  Whenever  M 
enters  p,  the  shortest  string  that  contains  all  the  nonblank  characters  on  M's  tape  will 
be  considered  to  have  been  enumerated  by  M.  Note  that  p  is  not  a  halting  state.  It 
merely  signals  that  the  current  contents  of  the  tape  should  he  viewed  as  a  member  of 
L.  M  may  also  have  a  halting  state  if  L  is  finite.  Formally,  we  say  that  a  Turing  machine 
M  enumerates  L  iff,  for  some  fixed  stale  p  of  M, 

L  =  {w:(s,  J)|-W*(p.  if)) 


A  language  L  is  Turing-enumerable  iff  there  is  a  Turing  machine  that  enumerates  it. 
Note  that  we  are  making  no  claim  here  about  the  order  in  which  the  strings  in  L  are 
generated. 

To  make  it  easy  to  describe  enumerating  Turing  machines  in  our  macro  language, 
we'll  define  the  simple  subroutine  />.  shown  in  Figure  20.2.  It  simply  enters  the  state  p 
and  halts. 


FIGURE  20.2  A  subroutine  that  takes  a  snapshot  of  the  tape. 
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EXAMPLE  20.1  Enumerating  in  Lexicographic  and  in  Random  Order 

Consider  the  language  a*.  Here  are  two  differentTuring  machines  that  enumerate  it: 

M2: 

f  I 

>/>a/’QRaRaRa/,QP 

M  i  enumerates  a*  in  lexicographic  order.  Af2  enumerates  it  in  a  less  straight¬ 
forward  order.  It  will  produce  the  sequence  e,  a,  aaa,  aa,  aaaaa,  aaaa, . . . 


Mi 


r~ i 

>PaR 


So  now  we  have  one  mechanism  for  using  a  Turing  machine  to  generate  a  language 
and  a  separate  mechanism  for  using  a  Turing  machine  to  accept  one.  Is  there  any  rela¬ 
tionship  between  the  class  of  T\iring-enumerable  languages  and  either  the  class  of  decid¬ 
able  languages  (D)  or  the  class  of  semidecidable  languages  (SD)?  The  answer  is  yes. 
The  class  of  languages  that  can  be  enumerated  by  a  Turing  machine  is  identical  to  SD. 

THEOREM  20.8  Turing  Enumerable  is  Equivalent  to  Semidecidable _ 

Theorem:  A  language  is  in  SD  (i.e.,  it  can  be  semidecided  by  some  "Hiring  machine) 
iff  it  is  Turing-enumerable. 

Proof:  We  must  do  two  proofs,  one  that  shows  that  if  a  language  is  "Hiring  enumer¬ 
able  then  it  is  in  SD  and  another  that  shows  that  if  a  language  is  in  SD  then  it  is 
"Hiring  enumerable. 

Proof  that  if  a  language  is  Turing  enumerable  then  it  is  in  SD:  If  a  language  L 
is  Turing  enumerable  then  there  is  some  Turing  machine  M  that  enumerates  it. 
We  convert  M  to  a  machine  M '  that  semidecides  L : 

M'(w:  string)  = 

1.  Save  input  w  on  a  second  tape. 

2.  Invoke  M ,  which  will  enumerate  L.  Each  time  an  element  of  L  is  enumer¬ 
ated.  compare  it  to  w.  If  they  match,  halt  and  accept.  Otherwise,  continue 
with  the  enumeration. 

Because  there  is  a  Turing  machine  that  semidecides  L,  it  is  in  SD.  Figure  20.3 
illustrates  how  M'  works. 

Proof  that  if  a  language  is  in  SD  then  K  is  Turing  enumerable:  If  LCX*  (for 
some  2)  is  in  SD,  then  there  is  a  Turing  machine  M  that  semidecides  it.  We  will 
use  M  to  construct  a  new  machine  M'  that  enumerates  L.The  idea  behind  M'  is 
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FIGURE  20.3  Using  an  enumerating  machine  in  a  semidecider. 


that  it  will  lexicographically  enumerate  It  will  consider  each  of  the  strings  it 
enumerates  as  a  candidate  for  membership  in  L.  So  it  will  pass  each  such  string  to 
M.  Whenever  M  accepts  some  string  w,  M'  will  output  it.lhe  problem  is  that  Af  is 
not  guaranteed  to  halt.  So  what  happens  if  M '  invokes  M  on  a  siring  that  is  not  in 
L  and  M  loops?  If  we  are  not  careful,  M’  will  wait  forever  and  never  give  other 
strings  a  chance. 

To  solve  this  problem,  M'  will  not  just  invoke  M  and  sit  back  and  wait  to  see 
what  happens.  It  will  carefully  control  the  execution  of  M.  In  particular,  it  will  in¬ 
voke  M  on  string!  and  let  it  compute  one  step.  Then  it  will  consider  strings.  It  will 
allow  M  to  compute  one  step  on  strings  and  also  one  more  step  on  stringy  Then 
it  will  consider  stringy  this  lime  trying  the  new  string *  for  one  step  and  applying 
one  more  step  to  the  computations  on  string?  and  on  stringy  Anytime  M  accepts 
some  string  in  this  sequence.  M'  will  output  that  siring.  If  there  is  some  string  s 
that  is  not  in  L ,  then  the  computation  corresponding  to  s  will  either  halt  and  re¬ 
ject  or  fail  to  halt.  In  either  case,  M'  will  never  output  s. 

This  pattern  is  shown  in  Figure  20.4.  Each  column  corresponds  to  a  candidate 
string  and  each  row  corresponds  to  one  stage  or  the  process.  At  each  stage,  a  new 
string  is  added  and  one  more  step  is  executed  for  each  string  that  is  already  being 
considered  but  on  which  M  has  not  yet  halted.  The  number  of  steps  that  have 
been  executed  on  each  string  so  far  is  shown  in  brackets.  If  M  does  halt  on  some 
string  (as.  for  example,  b,  in  the  chart  below),  that  column  will  simply  be  skipped 
at  future  stages. 

We  will  call  the  technique  that  we  just  described  dovetailing.  It  will  turn  out  to 
be  useful  for  other  similar  kinds  of  proofs  later. 
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FIGURE  20.4  Using  dovetailing  to  control  simulation. 
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So  a  description  of  M '  is: 

M'()  = 

L  Enumerate  all  we  2*  lexicographically.  As  each  string  Wj  is  enumerated: 

1.1.  Start  up  a  copy  of  M  with  Wj  as  its  input. 

1.2.  Execute  one  step  of  each  M,  initiated  so  far,  excluding  only  those  that 
have  previously  halted. 

2.  Whenever  an  Af  ,•  accepts,  output  wr 


20.5.2  Enumerating  in  Lexicographic  Order 

So  far,  we  have  said  nothing  about  the  order  in  which  the  strings  in  L  are  enumerated 
by  M.  But  now  suppose  we  do.  We  say  that  M  lexicographically  enumerates  L  iff  M 
enumerates  the  elements  of  L  in  lexicographic  order.  A  language  L  is  lexicographically 
Turing-enumerable  iff  there  is  a  Hiring  machine  that  lexicographically  enumerates  it. 

Now  we  can  ask  whether  there  is  any  relationship  between  the  class  of  lexicograph¬ 
ically  Hiring-enumerable  languages  and  any  of  the  other  classes  we  have  already  de¬ 
fined.  Just  as  we  found  in  the  last  section,  in  the  case  of  unordered  enumeration,  we 
discover  that  the  answer  is  yes.  The  class  of  languages  that  can  be  lexicographically 
enumerated  by  a  Turing  machine  is  identical  to  D. 


THEOREM  20.9  Lexicographically  Turing  Enumerable  is  Equivalent  to  Being 
Decidable 


Theorem:  A  language  is  in  D  iff  it  is  lexicographically  Turing-enumerable. 

Proof:  Again  we  must  do  two  proofs,  one  for  each  direction  of  the  implication. 

Proof  that  if  a  language  is  in  D  then  it  is  lexicographically  Turing  enumerable: 

If  a  language  L  C  2*  (for  some  2)  is  in  D,  then  there  is  some  Turing  machine  M 
that  decides  it.  Using  M,  we  can  build  M',  which  lexicographically  generates  the 
strings  in  2*  and  tests  them,  one  at  a  time  by  passing  them  to  M.  Since  M  is  a 
deciding  machine,  it  halts  on  all  inputs,  so  dovetailing  is  not  required  here.  If,  on 
string  w ,  M  halts  and  accepts,  then  M'  outputs  w.  If  M  halts  and  rejects,  then  M' 
just  skips  w  and  goes  on  to  the  next  string  in  the  lexicographic  enumeration. Thus 
M'  lexicographically  enumerates  L.The  relationship  between  M  and  M'  can  be 
seen  in  Figure  20.5. 

Proof  that  if  a  language  is  lexicographically  Turing  enumerable  then  it  is  in  D: 

If  a  language  L  is  lexicographically  Turing  enumerable,  then  there  is  some  Turing 
machine  M  that  lexicographically  enumerates  it.  Using  M,  we  can  build  M',  which, 
on  input  w,  starts  up  M  and  waits  until  either  M  generates  w  (in  which  case  M'  ac¬ 
cepts  w),  M  generates  a  string  that  comes  after  w  in  the  enumeration  (in  which 
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FIGURE  20.5  Using  a  decider  in  a  lexicographic  enumerator. 


ii' 


FIGURE  20.6  Using  a  lexicographic  enumerator  in  a  decider. 


case  M'  rejects  because  it  is  clear  that  M  will  never  go  back  and  generate  u>),  or  M 
halts  (in  which  case  M'  rejects  because  M  failed  to  generate  if) .Thus  Af'  decides 
L.The  relationship  between  M  and  M'  can  be  seen  in  Figure  20.6. 


20.6  Summary 

In  this  chapter  we  have  considered  several  ways  in  which  the  classes  D  and  SD  are  re¬ 
lated  and  we  have  developed  theorems  that  give  us  ways  to  prove  that  a  specific  lan¬ 
guage  L  is  in  D  and/or  SD.  Figure  20.7  attempts  to  summarize  these  results.  The 
column  labeled  IN  lists  our  techniques  for  proving  that  a  language  is  in  the  correspon¬ 
ding  language  class.  The  column  labeled  OUT  lists  our  techniques  for  proving  that  a 
language  is  not  in  the  corresponding  language  class.  We  have  listed  reduction  here  for 
completeness.  We  will  present  reduction  as  a  proof  technique  in  Chapter  21.  And  we 
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OUT 


Reduction 


Diagonalization,  or 
Reduction 


Pumping,  or 
Closure 


Pumping,  or 
Closure 


have  mentioned  unrestricted  grammars,  which  we  will  discuss  in  Chapter  23.  You’ll  also 
note,  in  the  figure,  one  example  language  in  each  class. 

Exercises 

1.  Show  that  the  set  D  (the  decidable  languages)  is  closed  under: 

a.  Union 

b.  Concatenation 

c.  Kleenestar 

d.  Reverse 

e.  Intersection 

1  Show  that  the  set  SD  (the  semidecidable  languages)  is  closed  under. 

a.  Union 

b.  Concatenation 

c.  Kleenestar 

d.  Reverse 

e.  Intersection 
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3.  Let  L j,  Li . Li  be  a  collection  of  languages  over  some  alphabet  £  such  that: 

•  For  all  /  *  j.  Lt  fl  L;  =  0. 

•  L,UL2U  ...  ULk  =  1*. 

•  V/  (Li  is  in  SD). 

Prove  that  each  of  the  languages  through  Lk  is  in  D. 

4.  If  L\  and  Ly  are  in  D  and  L\  C  L2  C  Ly.  what  can  we  say  about  whether  Li  is  in  D? 

5.  Let  L\  and  L2  be  any  two  decidable  languages.  State  and  prove  your  answer  to 
each  of  the  following  questions: 

a.  Is  it  necessarily  true  that  L\  -  L2  is  decidable? 

b.  Is  it  possible  that  L\  U  L2  is  regular? 

6.  Let  L\  and  L2  be  any  two  undecidable  languages.  State  and  prove  your  answer  to 
each  of  the  following  questions: 

a.  Is  it  possible  that  L\  -  L2  is  regular? 

b.  Is  it  possible  that  L\  U  L2  is  in  D? 

7.  Let  M  be  a  Turing  machine  that  lexicographically  enumerates  the  language  L. 
Prove  that  there  exists  a  Turing  machine  M'  that  decides  LK. 

8-  Construct  a  standard  one-tape  Turing  machine  M  to  enumerate  the  language: 

{ w :  it’  is  the  binary  encoding  of  a  positive  integer  that  is  divisible  by  3}. 

Assume  that  M  starts  with  its  tape  equal  to  J.  Also  assume  the  existence  of 
the  printing  subroutine  P.  defined  in  Section  20.5.1.  As  an  example  of  how  to 
use  P,  consider  the  following  machine,  which  enumerates  L\  where 
L'  =  {ic:  u’  is  the  unary  encoding  of  an  even  number}: 

\ - 

>  P  R  \  fi  \ 

You  may  find  it  useful  to  define  other  subroutines  as  well. 

9.  Construct  a  standard  one-tape  Turing  machine  M  to  enumerate  the  language 
A"Bn.  Assume  that  M  starts  with  its  tape  equal  to  J.  Also  assume  the  existence  of 
the  printing  subroutine  P.  defined  in  Section  20.5.1. 

10.  If  w  is  an  element  of  {0,  1}*.  let  ~>w  be  the  siring  that  is  derived  from  w  by 
replacing  every  0  by  1  and  every  1  by  0.  So.  for  example,  -Oil  =  100.  Consider 
an  infinite  sequence  5  defined  as  follows: 

S„  =  0. 

*  i  s„  ->sn. 

The  first  several  elements  of  S  are  0.  01.  0110.  01101001, 
0110100110010110.  Describe  a  Turing  machine  M  to  output  S.  Assume  that  M 
starts  with  its  tape  equal  to  J.  Also  assume  the  existence  of  the  printing  subrou¬ 
tine  P.  defined  in  Section  20.5.1,  but  now  with  one  small  change:  If  M  is  a  multi¬ 
tape  machine,  P  will  output  the  value  of  tape  1  .(Him:  Use  two  tapes.) 
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11.  Recall  the  function  mix ,  defined  in  Example  8.23.  Neither  the  regular  lan¬ 
guages  nor  the  context-free  languages  are  closed  under  mix.  Are  the  decidable 
languages  closed  under  mix'?  Prove  your  answer. 

12.  Let  2  =  {a,  b}.  Consider  the  set  of  all  languages  over  2  that  contain  only  even 
length  strings. 

a.  How  many  such  languages  are  there? 

b.  How  many  of  them  are  semidecidable? 

13.  Show  that  every  infinite  semidecidable  language  has  a  subset  that  is  not  decidable. 


CHAPTER  21 


Decidability  and  Undecidability 
Proofs 


We  now  know  two  languages  that  are  not  in  D: 

•  H  =  {<M.  w>  :  Turing  machine  M  halts  on  input  w } 

•  -,h  =  \<M.  w>  :  Turing  machine  Af  does  not  halt  on  input  M'}  (which  also  isn't  in  SD) 

In  this  chapter  we  will  see  that  they  are  not  alone.  Recall  that  we  have  two  equiva¬ 
lent  ways  to  describe  a  question:  as  a  language  (in  which  case  we  ask  whether  it  is  in 
D).  and  as  a  problem  (in  which  case  we  ask  whether  it  is  decidable  or  whether  it  can  be 
solved).  Although  all  of  our  proofs  will  be  based  on  the  language  formulation,  it  is 
sometimes  easier,  particularly  for  programmers,  to  imagine  the  question  in  its  problem 
formulation.  Table  21.1  presents  a  list,  staled  both  wavs,  of  some  of  the  undecidable 
questions  that  we  will  consider  in  this  and  succeeding  chapters. 


Table  21.1  The  problem  and  the  language  view. 

The  Prvhlem  View 

The  Language  View 

Given  a  Turing  machine  M  and  a  string  M\ 
does  M  hall  on  w'i 

H  =  \<M,  ir >:  TM  M  halts  on  input  it'} 

Given  a  Turing  machine  M  and  a  string  n\ 

=  |  <Af.  M’>  :  I'M  M  does  not  halt  on 

does  M  not  halt  on  ir? 

input  ir} 

Given  a  Turing  machine  M.  does  M  hall  on 
the  empty  tape? 

H,  *  { <  M  >  :  TM  M  halls  on  c } 

Given  a  Turinu  machine  M.  is  there  any 

Hasy  =  {<  M  >  :  there  exists  at  least  one 

siring  on  which  M  halts? 

string  on  which  I'M  M  halts  } 

Given  a  Turing  machine  Af.does  M  accept 
all  strings? 

Aali  =  \L{M)  =  1*} 

Given  two  Turing  machines  Ma  and  M *.  do 
they  accept  the  same  languages? 

EqTMs  =  <  <  Af1(>  :  /.(A/J  =  L(A/h)} 

Given  a  TUring  machine  M.  is  the  language 
that  M  accepts  regular? 

TM ai’ii  =  { <M>  :  L{M)  is  regular} 
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Some  of  these  languages  are  also  not  in  SD.We  will  return  to  them  in  Section  21.6, 
where  we  will  see  how  to  prove  that  languages  are  not  in  SD. 

The  primary  technique  that  we  will  use  here  to  show  that  a  language  L  is  not  in  D  is 
reduction.  We  will  show  that  if  L  were  in  D.  we  could  use  its  deciding  machine  to  decide 
some  other  language  that  we  already  know  is  not  decidable. Thus  we  can  conclude  that 
L  is  not  decidable  either. 

.1  Reduction 

We  reduce  a  problem  to  one  or  more  other  problems  when  we  describe  a  solution  to 
the  first  problem  in  terms  of  solutions  to  the  others.  We  generally  choose  to  reduce  to 
simpler  problems,  although  sometimes  it  makes  sense  to  pick  problems  just  because  we 
already  have  solutions  for  them.  Reduction  is  ubiquitous  in  everyday  life,  puzzle  solv¬ 
ing,  mathematics,  and  computing. 


EXAMPLE  21.1  Calling  Jen 

We  want  to  call  our  friend  Jen  hut  don’t  have  her  number.  But  we  know  that  Jim 
has  it.  So  we  reduce  the  problem  of  finding  Jen’s  number  to  the  problem  of  get¬ 
ting  hold  of  Jim. 


The  most  important  properly  of  a  reduction  is  clear  even  in  the  very  simple  example 
of  finding  Jen's  number: 

The  reduction  exists  AND  there  is  a  procedure  that  works  for  gelling  hold  of  Jim 
IMPLIED  we  will  have  Jen’s  number. 

But  what  happens  if  there  is  no  way  to  gel  hold  of  Jim?  Does  that  mean  that  we  can¬ 
not  find  Jen’s  number?  No. There  may  be  some  other  way  to  get  it. 

If.  on  the  other  hand,  we  knew  (via  some  sort  of  oracle)  that  there  is  no  way  we 
could  ever  end  up  with  Jen’s  number,  and  if  we  still  believed  in  the  reduction  (i.e..  we 
believed  that  Jim  knows  Jen's  number  and  would  be  willing  to  give  it  to  us),  we  would 
be  forced  to  conclude  that  there  exists  no  effective  procedure  for  getting  hold  of  Jim. 


EXAMPLE  21.2  Crisis  Detection 

Suppose  that  we  want  to  know  whether  there  is  some  sort  of  crisis  brewing  in  the 
world,  our  city,  or  the  company  we  work  for.  We'd  like  to  ask  the  Pentagon,  the  city 
council,  or  top  management,  but  they  probably  won’t  tell  us.  But  perhaps  we  can 
reduce  this  question  to  one  we  can  answer:  Has  there  been  a  spike  this  week  in  or¬ 
ders  for  middle-of-the-night  pizza  delivery  to:  the  Pentagon,  the  town  hall,  corpo¬ 
rate  headquarters?  This  reduction  will  work  provided  all  of  the  following  are  true: 
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EXAMPLE  21.2  ( Continued ) 

•  There  will  be  all-nighters  at  the  specified  locations  if  and  only  if  there  is  a 
crisis. 

•  There  will  be  a  spike  in  middle-of-the-night  pizza  orders  if  and  only  if  there  are 
all-nighters  there. 

•  It  is  possible  to  find  out  about  pizza  orders. 


The  crisis-detection  example  illustrates  a  common  use  of  reduction:  We  wish  to  solve 
a  problem  but  have  no  direct  way  of  doing  so.  So  we  look  for  a  way  to  transform  the 
problem  we  care  about  into  some  other  problem  that  we  can  solve.  The  transformation 
must  have  the  property  that  the  answer  to  this  new  problem  provides  the  answer  to  the 
original  one. 


EXAMPLE  21.3  Fixing  Dinner 

We  can  reduce  the  problem  of  fixing  dinner  to  a  set  of  simpler  problems:  Ftx  the 
entree,  fix  the  salad,  and  fix  the  dessert. 


EXAMPLE  21.4  Theorem  Proving 

Suppose  that  we  want  to  establish  Q(A)  and  that  we  have,  as  a  theorem: 

Vj :{R(x)  A  A’(.r)  A  T(x)—*Q(x)). 

Then  we  can  reduce  the  problem  of  proving  Q(A )  to  three  new  ones:  prov  ing  R(A), 
5(/4).and  T(A). 


Backward  chaining  solves  problems  by  reducing  complex  goals  to  simpler 
ones  until  direct  solutions  can  be  found.  It  is  used  in  theorem  prove rs  and  in 
a  variety  of  kinds  of  automatic  reasoning  and  intelligent  systems.  (M.2.3) 


These  last  two  examples  illustrate  an  important  kind  of  reduction,  often  called  divide 
and  conquer.  One  problem  is  reduced  to  two  or  more  problems,  all  of  which  must  be 
solved  in  order  to  produce  a  solution  to  the  original  problem.  But  each  of  the  new’  prob¬ 
lems  is  assumed  to  be  easier  to  solve  than  the  original  one  was. 
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EXAMPLE  21.5  Nim 

Nim11  Q  starts  with  one  or  more  piles  of  sticks. Two  players  take  turns  removing 
slicks  from  the  piles.  At  each  turn,  a  player  chooses  a  pile  and  may  remove  some 
or  all  of  the  sticks  from  that  pile.  The  player  who  is  left  with  no  sticks  to  remove 
loses.  For  example,  an  initial  configuration  of  a  Nim  game  could  be  the  following, 
in  which  the  sticks  are  arranged  in  three  piles: 


Consider  the  problem  of  determining  whether  there  is  any  move  that  we  can 
make  that  will  guarantee  that  we  can  win.  The  obvious  way  to  solve  this  problem 
is  to  search  the  space  of  legal  moves  until  we  find  a  move  that  makes  it  impossible 
for  the  other  player  to  win.  If  we  find  such  a  move,  we  know  that  we  can  force  a 
win.  If  we  don’t,  then  we  know  that  we  cannot.  But  the  search  tree  can  be  very 
large  and  keeping  track  of  it  is  nearly  impossible  for  people.  So  how  can  we  an¬ 
swer  the  question? 

We  can  reduce  the  problem  of  searching  a  Nim  game  tree  to  a  simple  problem 
in  Boolean  arithmetic.  We  represent  the  number  of  sticks  in  each  pile  as  a  binary 
number,  arrange  the  numbers  in  a  column,  lining  up  their  low-order  digits,  and 
then  apply  the  exclusive-or  (XOR)  operator  to  each  column.  So,  in  the  example 
above,  we’d  have: 

1(H)  (4) 

010  (2) 

101  (5) 

Oil 

If  the  resulting  siring  is  in  0+,  then  the  current  board  position  is  a  guaranteed 
loss  for  the  current  player.  If  the  resulting  string  is  not  in  0*,  then  there  is  a  move 
by  which  the  current  player  can  assure  that  the  next  position  will  be  a  guaranteed 
loss  for  the  opponent.  So,  given  a  Nim  configuration,  we  can  decide  whether  we 
can  guarantee  a  win  by  transforming  it  into  the  XOR  problem  wc  just  described 
and  then  checking  to  see  that  the  result  of  the  XOR  is  not  in  0\ 

In  addition,  we  can  easily  extend  this  approach  so  that  it  tells  us  what  move  we 
should  make.  All  that  is  required  is  to  choose  one  number  (i.e.,  one  pile  of  sticks) 
and  subtract  from  it  some  number  such  that  the  result  of  XORing  together  the 
new  counts  will  yield  some  string  in  0  .  There  may  be  more  than  one  such  move. 


' '  litis  description  is  taken  from  jMisra  2(KM]. 
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EXAMPLE  21.5  ( Continued ) 

bul  il  suffices  just  to  find  the  first  one.  So  we  try  the  rows  one  at  a  time.  In  our  ex¬ 
ample,  we  quickly  discover  that  if  we  remove  one  stick  from  the  second  pile  (the 
one  currently  containing  two  sticks),  then  we  get: 

l(K)  (4) 

001  (1) 

m  (5) 

0(H) 

So  we  remove  one  slick  from  the  second  pile.  No  search  of  follow-on  moves  is 
required. 


Some  combinatorial  problems  can  he  solved  easily  by  reducing  them  to  graph 
problems.  (E.3) 


EXAMPLE  21.6  Computing  a  Function 

Suppose  that  we  have  access  only  to  a  very  simple  calculator  that  can  perform  inte¬ 
ger  addition  but  not  multiplication.  We  can  reduce  the  problem  of  computing  .ry  to 
the  problem  of  computing  a  +  b  as  follows: 

multi  ply(x:  integer,  y:  integer)  = 
answer  =  0. 

For  /  =  1  to  y  do: 

answer  =  answer  +  x. 

Return  answer. 


21.2  Using  Reduction  to  Show  that  a  Language 
is  Not  Decidable 

So  far.  we  have  used  reduction  to  show  that  problem ,  is  solvaNe  if  problem;  is.  Now  we 
will  turn  the  idea  around  and  use  it  to  show  that  problem;  is  not  solvable  given  that  we  al¬ 
ready  know  that  problem  |  isn't.  Reduction,  as  we  are  about  to  use  il.  is  a  proof  by  contra¬ 
diction  technique.  We  will  say,  “Suppose  that  problem;  were  decidable. 'll  ten  we  could  use 
its  decider  as  a  subroutine  that  would  enable  us  to  solve  problem,.  Bui  we  already  know 
that  there  is  no  way  to  solve  problem,.  So  there  isn't  any  way  to  solve  problem;  either.” 

In  the  rest  of  this  chapter,  we  arc  going  to  construct  arguments  of  exactly  this  form  to 
show  that  various  languages  are  not  in  D  because  II  -  {<\l.  w>  :  Turing  machine  M 
halts  on  input  string  u'\  isn't.  We'll  then  extend  the  technique  to  show  that  some  languages 
arc  not  in  SD  either  (because  -<H  isn't).  Bul.  befoie  we  do  that,  we  should  note  one  very 


21.2  Using  Reduction  to  Show  that  a  Language  is  Not  Decidable  453 


EXAMPLE  21.7  Dividing  an  Angle 

Given  an  arbitrary  angle,  divide  it  into  sixths,  using  only  a  straightedge  and  a  com¬ 
pass.  We  show  that  there  exists  no  general  procedure  to  solve  this  problem.  Sup¬ 
pose  that  there  were  such  a  procedure,  which  we'll  call  sixth. Then  we  could  define 
the  following  procedure  to  trisect  an  arbitrary  angle: 

trisect  (a:  angle)  = 

1.  Divide  a  into  six  equal  parts  by  invoking 

2.  Ignore  every  other  line,  thus  dividing  a  into  thirds. 

So  we  have  reduced  the  problem  of  trisecting  an  angle  to  the  problem  of  divid¬ 
ing  it  into  sixths.  But  we  know  that  there  exists  no  procedure  for  trisecting  an  ar¬ 
bitrary  angle  using  only  a  straightedge  and  compass.  The  proof  of  that  claim  relies 
on  a  branch  of  mathematics  known  as  Galois  theory  after  the  French  mathe¬ 
matician  Evariste  Galois,  who  was  working  on  the  problem  of  discovering  solu¬ 
tions  for  polynomials  of  arbitrary  degree.  An  interesting  tidbit  from  the  history  of 
mathematics:  Galois's  work  in  this  area  was  done  while  he  was  still  a  teenager,  but 
was  not  published  during  his  lifetime,  which  ended  when  he  was  killed  in  a  duel  in 
1832  at  age  20. 

If  sixth  existed,  then  trisect  would  exist.  But  we  know  that  trisect  cannot  exist. 
So  neither  can  sixth. 


important  thing  about  arguments  of  this  sort:  Solvability  (and  decidability)  results  can 
hinge  on  the  details  of  the  specifications  of  the  problems  involved.  For  example,  let’s 
reconsider  the  angle  trisection  problem.  This  time,  instead  of  the  requirement,  “using 
only  a  straight  edge  and  a  compass,"  we’ll  change  the  rules  to  “in  origami."  Now.  it 
turns  out  that  it  is  possible  to  trisect  an  angle  using  the  paper  folding  and  marking 
operations  that  origami  provides  Q.  We  will  have  to  be  very  careful  to  state  exactly 
what  wc  mean  in  specifying  the  languages  that  we  are  about  to  consider. 


In  the  rest  ol  this  chapter,  we  are  going  to  use  reduction  in  a  very  specific  way. The  goal 
of  a  reduction  is  to  enable  us  to  describe  a  decision  procedure  lor  a  language  L\  by  using 
a  decision  procedure  (which  we  will  call  Oracle)  that  we  hypothesize  exists  for  some 
other  language  Li.  Furthermore,  since  our  goal  is  to  develop  a  decision  procedure  (i.e., 
design  a  Turing  machine),  we  are  interested  only  in  reductions  that  are  themselves  com¬ 
putable  (i.e.,  can  be  implemented  as  a  Turing  machine  that  is  guaranteed  to  halt).  So  the 
precise  meaning  of  reduction  that  we  will  use  in  the  rest  of  this  book  is  the  following: 

A  reduction  II  from  L\  to  L-y  consists  ol  one  or  more  Turing  machines  with  the  follow¬ 
ing  property:  II  theie  exists  a  luring  machine  Oracle  that  decides  (or  semidecides)  L->, 
then  the  I  ui  ing  machines  in  R  can  be  composed  with  Oracle  to  build  a  deciding  (or  a  semi- 
deciding)  Tuiing  machine  tor  Lj.The  idea  is  that  the  machines  in  R  perform  the  straight- 
lorward  pails  ol  the  task,  while  we  assume  that  Oracle  can  do  a  good  deal  of  the  work.12 


Ol,  is  common  to  define  a  reduction  as  a  function,  rather  than  as  a  Turing  machine.  But.  when  that  is  done. 
We  require  i  at  t  k  Iuik lion  he  computable.  Since  the  computable  functions  are  exactly  the  functions  that 
can  be  computed  by  some  luring  machine,  these  two  definitions  are  equivalent. 
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We  will  focus  on  the  existence  of  deciding  Turing  machines  now.  Then,  in  Section 
21.2.  we  will  use  this  same  idea  when  we  explore  the  existence  of  semideciding  ma¬ 
chines.  We  will  use  the  notation  P  s  P'  to  mean  that  P  is  reducible  to  P'.  While  we  re¬ 
quire  that  a  reduction  be  one  or  more  Turing  machines,  we  will  allow  the  use  of  clear 
pseudocode  as  a  way  to  specify  the  machines.  Because  the  key  property  of  a  reduction, 
as  we  have  just  defined  it.  is  that  it  be  computable  by  Turing  machines,  reducibility  in 
this  sense  is  sometimes  called  Turing  reducibility. 

Since  our  focus  in  the  rest  of  Part  IV  is  on  answering  the  question. “Does  there  exist  a 
Turing  machine  to  decide  (or  semidecide)  some  language  LT  we  will  accept  as  a  reduc¬ 
tion  any  collection  of  Turing  machines  that  meets  the  definition  that  we  just  gave.  If,  in 
addition,  we  cared  about  the  efficiency  of  our  (scmi)deciding  procedure,  we  would  also 
have  to  care  about  the  efficiency  of  the  reduction.  We  will  discuss  that  issue  in  Part  V. 

Having  defined  reduction  precisely  in  terms  of  Turing  machines,  we  can  now  return 
to  the  main  topic  of  the  rest  of  this  chapter:  How  can  we  use  reduction  to  show  that 
some  language  L i  is  not  decidable?  When  we  reduce  L\  to  Li  via  a  reduction  ft,  we 
show  that  if  L2  is  in  D  then  so  is  L\  (because  we  can  decide  it  with  a  composition  of  the 
machines  in  R  with  the  Oracle  that  decides  Li).  So  what  if  we  already  know  that  L,  is 
not  in  D?  Then  we  have  just  shown  that  Li  isn’t  either. 

To  sec  why  this  is  so,  recall  that  the  definition  of  reduction  tells  us  that: 

(R  is  a  reduction  from  L\  to  Li)  A  (Li  is  in  D)  — *  (L(  is  in  D). 

If  (L |  is  in  D)  is  false,  then  at  least  one  of  the  two  antecedents  of  that  implication  must  be 
false.  So  if  we  know  that  (ft  is  a  reduction  from  L\  to  Li)  is  true,  then  ( Li  is  in  D )  must  be  false. 

We  now  have  a  way  to  show  that  some  new  language  L:  is  not  in  D:  We  find  a  lan¬ 
guage  that  is  reducible  to  Li  and  that  is  known  not  to  be  in  D.  We  already  have  one  lan¬ 
guage,  H,  that  is  not  in  D.  So  we  can  use  it  to  prove  that  other  languages  aren't  either. 

Figure  21.1  shows  the  form  of  this  argument  graphically.  The  solid  arrow  indicates 
the  reduction  ft  from  L,  to  Li.The  other  two  arrows  correspond  to  implication.  So  the 
diagram  says  that  if  L,  is  reducible  to  Li  then  we  know  (as  shown  in  the  upward  impli¬ 
cation)  that  if  Li  is  in  D.so  is  L\.  But  L\  is  known  not  to  be  in  D.  So  (as  shown  in  the 
downward  implication)  we  know  that  L2  is  not  in  D  either. 

The  important  thing  about  this  diagram  is  the  direction  of  the  arrows.  We  reduce  L\ 
to  Li  lo  show  that  the  undecidability  of  L\  guarantees  that  Li  is  also  undecidable.  As 
we  do  our  reduction  proofs,  we  must  be  careful  always  lo  reduce  a  known  undecidable 
language  to  the  unknown  one. The  most  common  mistake  in  doing  reduction  proofs  is 
lo  do  them  backwards. 

/.i  (known  not  to  be  in  13)  L ,  in  13 

I"  I 

L»  (a  new  language  whose  it  L;  in  13 

decidability  we  are 
trying  to  determine) 

FIGURE  21.1  Using  reduction  for  undecidabilily. 


Hut  L. |  not  in  D 


I.;  not  in  D 
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Summarizing  whal  wc  have  said:  To  use  reduction  to  show  that  a  language  L2  is  not 
in  D,  we  need  to  do  three  things: 

1.  Choose  a  language  L\  to  reduce  from.  We  must  choose  an  Lx 

•  that  is  already  known  not  to  be  in  D,  and 

•  that  can  be  reduced  to  L2  (i.e.,  there  would  be  a  deciding  machine  for  it  if 
there  existed  a  deciding  machine  for  L2). 

2.  Define  the  reduction  R  and  describe  the  composition  C  of  R  with  Oracle .  the  ma¬ 
chine  that  we  hypothesize  decides  L2. 

3.  Show  that  C  does  correctly  decide  L\  if  Oracle  exists.  We  do  this  by  showing 

•  that  R  can  be  implemented  as  one  or  more  Turing  machines,  and 

•  that  C  is  correct,  meaning  that  it  correctly  decides  whether  its  input  a  is  an 
element  of  Lx.  To  do  this,  we  must  show  that: 

•  IT  x e  /.,.  then  C(.v)  accepts,  and 

•  If  a-  e  /.,.  then  C(.v)  rejects. 


.2.2  Mapping  Reducibility 

The  most  straightforward  way  to  reduce  one  problem,  which  we'll  call  A,  to  another, 
which  we’ll  call  tf ,  is  to  find  a  way  to  transform  instances  of  A  into  instances  of  B.Then 
we  simply  hand  the  transformed  input  to  the  program  that  solves  B  and  return  the  re¬ 
sult.  In  Example  21.5,  we  illustrated  this  idea  in  our  solution  to  the  problem  of  deter¬ 
mining  whether  or  not  we  could  force  a  win  in  the  game  of  Nim.  We  transformed  a 
problem  involving  a  pile  of  sticks  into  a  Boolean  XOR  problem.  And  we  did  it  in  such 
a  way  that  a  procedure  that  determined  whether  the  result  of  the  XOR  was  nonzero 
would  also  tell  us  whether  we  could  force  a  win.  So  our  reduction  consisted  of  a  single 
procedure  transform.  Then  we  argued  that,  if  XORsolve  solved  the  Boolean  XOR 
problem,  then  XOR-solve(transform(x))  correctly  decided  whether  x  was  a  position 
from  which  we  could  guarantee  a  win. 

In  the  specific  context  of  attempting  to  solve  decision  procedures,  we  can  formalize 
this  idea  as  follows:  Given  an  alphabet  S,  we  will  say  that  Lx  is  mapping  reducible  to  L2, 
which  we  will  write  as  Lx  =sAf  L2.  iff  there  exists  some  computable  function /such  that: 

Va  e  £*  (a  e  L,  iff  / (a)  e  L2). 

In  general,  the  function  /gives  us  a  way  to  transform  any  value  .v  into  a  new  value  a' 
so  that  we  can  answer  the  question, “Is  a  in  L,?’’  by  asking  instead  the  question, ‘is  a' 
in  L2T'  If/  can  be  computed  by  some  Turing  machine  R,  then  R  is  a  mapping  reduction 
from  /-i  to  l<-  So.  if  Lx  <M  L2  and  there  exists  a  Turing  machine  Oracle  that  decides 
L2.  then  the  following  Turing  machine  C.  which  is  simply  the  composition  of  Oracle 
with  R.  will  decide  Lx: 


C(  a)  =  Orac(e(R{x)). 
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The  first  several  reduction  proofs  that  we  will  do  use  mapping  reducibility.  In  the 
first  few.  we  show  that  a  new  language  L 2  is  not  in  D  because  H  can  be  reduced  to  it. 
Once  we  have  done  several  of  those  proofs,  we'll  have  a  collection  of  languages,  all  of 
which  have  been  shown  not  to  be  in  D.Then.  for  a  new  proof  that  some  language  L->  is 
not  in  D,  it  will  suffice  to  show  that  any  one  of  the  others  can  be  reduced  to  it. 


THEOREM  21.1  "Does  M  Halt  on  el"  is  Undecidable 

Theorem:  The  language  H,.  =  {<M>  : Turing  machine  M  halls  on  b}  is  in  SD/D. 

Proof:  We  will  first  show  that  HP  is  in  SD.Then  we  will  show  that  it  is  not  in  D. 

We  show  that  H,.  is  in  SD  by  exhibiting  a  Turing  machine  T  that  semidecides  it. 

T  operates  as  follows: 

T(<M>)  = 

1.  Run  M  on  e. 

2.  Accept. 

T  accepts  <M>  iff  M  halts  on  e.  so  T  semidecides  HK. 

Next  w>e  show  that  H  sM  Hf.  and  so  HK  is  not  in  D.  We  will  define  a  mapping 
reduction  R  whose  job  will  be  to  map  instances  of  H  to  instances  of  He  in  such  a 
way  that,  if  there  exists  a  Turing  machine  (which  wc  will  call  Oracle)  that  decides 
Hk,  then  Oracle(R(<M.  «»>))  will  decide  H. 

R  will  transform  any  input  of  the  form  <M .  w>  into  a  new  string,  of  the  form 
<M>,  suitable  as  input  to  Oracle.  Specifically,  what  R  does  is  to  build  a  new  Tur¬ 
ing  machine,  which  we  will  call  MU.  that  halts  on  e  iff  M  halts  on  w.  One  way  to  do 
that  is  to  build  MU  so  that  it  completely  ignores  its  own  input. That  means  that  it 
will  halt  on  everything  (including  e)  or  nothing.  And  we  need  for  it  to  halt  on 
everything  precisely  in  case  M  would  halt  on  w. That's  easy.  Let  MU  simply  run  M 
on  w.  It  will  hall  on  everything  iff  M  halts  on  w.  Note  that  MU.  like  every  Turing 
machine  has  an  input,  namely  whatever  is  on  its  tape  when  it  begins  to  execute. 
So  we  ll  define  a  machine  MU(x),  where  x  is  the  name  we'll  give  to  MU' s  input 
tape.  We  must  do  that  even  though,  in  this  and  some  other  cases  well  consider,  it 
happens  that  the  behavior  of  MU  doesn't  depend  on  what  its  input  tape  contains. 

So  let  R  be  a  mapping  reduction  from  H  to  H,.  defined  as  follows: 

R(<M.w>)  = 

1.  Construct  the  description  <MU>  of  a  new  Turing  machine  MU(.x)  that, 

on  input  *,  operates  as  follows: 

1.1.  Erase  the  tape. 

1.2.  Write  w  on  the  tape. 

13.  Run  M  on  UK 

2.  Return  <MU>, 

We  claim  that  if  Oracle  exists  and  decides  H,..  then  C  =  Oracle(R(<M ,  tt»)) 
decides  H.  To  complete  the  proof,  we  need  to  show  that  R  corresponds  to  a 
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computable  function  (i.e.,  that  it  can  be  implemented  by  a  TUring  machine)  and 
that  C  does  in  fact  decide  H: 

•  R  can  be  implemented  as  a  Turing  machine:  R  must  construct  <Af#>  from 
<M,w>.  To  see  what  M#  looks  like,  suppose  that  w  =  aba.  Then  M#  will 
sweep  along  its  input  tape,  blanking  it  out.  Then  it  will  write  the  string  aba, 
move  its  read/write  head  back  to  the  left,  and,  finally,  pass  control  to  M.  So,  in 
our  macro  language.  M#  will  be: 

♦ - 1 

>R  -*□  ,  □ 

□ 

aR  bRal^M 

The  procedure  for  constructing  A/#,  given  an  arbitrary  M  and  w,  is: 

1.  Write  the  following  code,  which  erases  the  tape: 

i  I 

>R  -*□  t  □ 

I  □ 


2.  For  each  character  c  in  w  do: 

2.2.  Write  c. 

2.3.  If  c  is  not  the  last  character  in  w ,  write  R. 

3.  Write  LrjAf. 

•  C  is  correct:  Af#  ignores  its  own  input.  It  halts  on  everything  or  nothing. Think 
of  its  step  1.3  as  a  gate.  The  computation  only  makes  it  through  the  gate  if  M 
halts  on  w.  If  that  happens  then  M#  halts,  no  matter  what  its  own  input  was. 
Otherwise,  it  loops  in  step  1.3.  So: 

•  If  <M,  w>  e  H:  M  halts  on  w,  so  M#  halts  on  everything.  In  particular,  it 
halts  on  e.  Oracle(<M#> )  accepts. 

•  If  <  M ,  w>  g  H:  M  does  not  halt  on  u\  so  M#  halts  on  nothing  and  thus 
not  on  e.  Oracle(<M#>)  rejects. 

But  no  machine  to  decide  H  can  exist,  so  neither  does  Oracle. 

This  result  may  seem  surprising.  It  says  that  if  we  could  decide  whether  some  Tur¬ 
ing  machine  M  halts  on  the  specific  string  e,  then  we  could  solve  the  more  general 
problem  of  deciding  whether  a  machine  M  halts  on  an  arbitrary  input.  CleaTly,  the 
other  way  around  is  true:  If  we  could  decide  H  (which  we  cannot),  then  we  could  de¬ 
cide  whether  M  halts  on  any  one  particular  string.  But  doing  a  reduction  in  that  di¬ 
rection  would  tell  us  nothing  about  whether  He  is  decidable.  The  significant  thing 
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that  we  just  saw  in  ihis  proof  is  that  there  also  exists  a  reduction  in  the  direction  that 
does  tell  us  that  H,  is  not  decidable. 

To  understand  the  reduction  proof  that  we  just  did  (and  all  the  others  that  we  are 
about  to  do),  keep  in  mind  that  it  involves  two  different  kinds  of  languages: 

•  H  and  He:  The  strings  in  Hc  are  encodings  of  Turing  machines,  so  they  look  like 

(qOOO , aOOO , qOOl , aOlO ,  ,  (qOOO, aOOO, qOO 1 , aOlO , -0 ,  ... 

The  strings  in  H  are  similar,  except  that  they  also  include  a  particular  W , 
so  they  look  like 

qOOO , aOOO , qOOl , aOlO , ,  (qOOO , aOOO , qOOl , aOlO , — ) , 

. . . ;  aabb 

•  The  language  on  which  some  particular  Turing  machine  M .  whose  membership  in 
either  H  or  HK  wc  arc  trying  to  determine,  halts:  Since  M  can  be  any  Turing  ma¬ 
chine,  the  set  of  strings  on  which  M  halts  can  be  anything.  It  might,  for  example.be 
A"B",  in  which  case  it  would  contain  strings  like  aaabbb.  It  could  also,  of  course.be 
a  language  of  Turing  machine  descriptions,  but  it  will  help  to  keep  from  getting  con¬ 
fused  if  you  think  of  M's  whose  job  is  to  recognize  languages  like  AnBn  that  are 
very  different  from  H. 

The  proof  also  referred  to  five  different  Turing  machines: 

1.  Oracle  (the  hypothesized,  but  provably  nonexistent,  machine  to  decide  HJ. 

2.  R  (the  machine  that  builds  M#).This  one  actually  exists. 

3.  C  (the  composition  of  R  with  Oracle). 

4.  (the  machine  whose  description  we  will  pass  as  input  to  Oracle).  Note 
that  will  never  actually  be  executed. 

5.  M  (the  machine  w'hose  behavior  on  the  input  siring  w  we  are  interested  in 
determining).  Its  description  is  input  to  R. 

Figure  2 1 .2  shows  a  block  diagram  of  C.  It  illustrates  the  relationship  among  the  five 
machines. 

<M.  te> 

Accept 

Reject 


FIGURE  21.2  The  relationships  among  C.  R.  and  Oracle. 


2 1 .2  Using  Reduction  to  Show  that  a  Language  is  Not  Decidable  459 


THEOREM  21.2  "Does  M  Halt  on  Anything?"  is  Undecidable 

Theorem:  The  language  HANY  =  {<M>:  there  exists  at  least  one  string  on  which 
Turing  machine  M  halls]  is  in  SD/D. 

Proof:  Again,  we  will  first  show  that  H  ANY  is  in  SD. Then  we  will  show  that  it  is  not  in  D. 
We  show  that  HANY  is  in  SD  by  exhibiting  a  Turing  machine  T  that  semide- 
cides  it.  We  could  try  building  T so  that  it  simply  runs  M  on  all  strings  in  2*  in  lex¬ 
icographic  order.  If  it  finds  one  that  halts,  it  halts.  But.  of  course,  the  problem  with 
this  approach  is  that  if  M  fails  to  halt  on  the  first  string  T  tries,  T  will  get  stuck  and 
never  try  any  others.  So  we  need  to  try  the  strings  in  2*  in  a  way  that  prevents  T 
from  getting  stuck.  We  build  T so  that  it  operates  as  follows: 

T(<M>)  = 

1.  Use  the  dovetailing  technique  described  in  the  proof  of  Theorem  20.8  to 
try  M  on  all  of  the  elements  of  2*  until  there  is  one  string  on  which  M 
halts.  Recall  that,  in  dovetailing,  we  run  M  on  one  step  of  the  first  string, 
then  another  step  on  that  string  plus  one  step  on  the  next,  and  so  forth, 
as  shown  here  (assuming  2  =  {a.b}): 

e  [1] 

e  [2]  a  [1] 

*  [3]  a  [2]  b  [1] 

e  [4]  a  [3]  b  [2]  aa  [1] 

€  [5]  a  [4]  b  [3]  aa  [2]  ab  [1] 

e  [6]  a  [5]  aa  [3]  ab  [2]  ba  [1] 

2.  If  any  instance  of  M  halts,  halt  and  accept. 

T  will  accept  iff  M  halts  on  at  least  one  string.  So  T  semidecides  HAnY. 

Next  wc  show  that  H  H ANY  and  so  HANY  is  not  in  D.  Let  R  be  a  mapping 

reduction  from  H  to  HANY  defined  as  follows: 

R(<M,w>)  = 

1.  Construct  the  description  <MU>  of  a  new  Turing  machine  M#(.v)  that, 
on  input  .v,  operates  as  follows: 

1.1.  Examine  x. 

1.2.  If  -v  =  w,  run  M  on  jc, else  loop. 

2.  Return  <M#>. 

If  Oracle  exists  and  decides  H  ANY.  then  C  =  Orade(R(<M , «»))  decides  H: 

.  R  can  be  implemented  as  a  Turing  machine:  The  proof  is  similar  to  that  for 
Theorem  21.1.  We  will  omit  it  in  this  and  future  proofs  unless  it  is  substan¬ 
tially  different  from  the  one  we  have  already  done. 

•  C  is  correct.  MU  s  behavior  depends  on  its  input.The  only  string  on  which  M# 
has  a  chance  of  hailing  is  w.  So: 

•  If  <  M,  w>  e  H.  M  halts  on  w.  so  M#  halls  on  w.  So  there  exists  at  least 
one  string  on  which  MU  halts.  Oracle(<M*>)  accepts. 
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«  If  <M,  w>  e  H:  M  does  not  halt  on  w,  so  neither  does  A/#.  So  there  exists 
no  siring  on  which  M#  hulls.  Orude{<M#>)  rejects. 

But  no  machine  to  decide  I !  can  exist,  so  neither  docs  Oracle. 

Sometimes  there  is  more  than  one  straightforward  reduction  that  works.  For  exam¬ 
ple,  here  is  an  alternative  proof  that  HANY  ‘s  no1  *n  D: 

Proof:  We  show  that  HANY  is  not  in  D  by  reduction  from  H.  Let  R  be  a  mapping  re¬ 
duction  from  H  to  HanY  defined  as  follows: 

R(<M,  w>)  = 

1.  Construct  the  description  <A/#>  of  a  new  Turing  machine  Af#(.v)  that, 
on  input  .r,  operates  as  follows: 

1.1.  Erase  the  tape. 

1.2.  Write  w  on  the  tape. 

1.3.  Run  M  on  w. 

2.  Return  <A/#>. 

If  Oracle  exists  and  decides  HANY,  then  C  =  Oracle(R{<M ,  ir>))  decides  H. 
R  can  be  implemented  as  a  Turing  machine.  And  C  Is  correct.  A/#  ignores  its  own 
input.  It  halts  on  everything  or  nothing.  So: 

•  If  <M.  w>  e  H:  M  halts  on  so  Af#  halts  on  everything.  So  it  halts  on  at 
least  one  string. Oracle(<M#>)  accepts. 

•  If  <  A/,  w>  eH :  M  does  not  halt  on  w\  so  Af#  halts  on  nothing.  So  it  does  not 
hall  on  at  least  one  string.  Oracle(<M#> )  rejects. 

But  no  machine  to  decide  H  can  exist,  so  neither  does  Oracle. 


Notice  that  we  used  the  same  reduction  in  this  lust  proof  that  we  used  for  Theorem  21.1. 
This  is  not  uncommon.The  fact  that  a  single  construction  may  be  the  basis  for  several  reduc¬ 
tion  proofs  is  important.  It  derives  from  the  fact  that  several  quite  different  looking  prob¬ 
lems  may  in  fact  be  distinguishing  between  the  same  two  cases. 

Recall  the  steps  in  doing  a  reduction  proof  of  undecidability: 

1.  Choose  an  undecidable  language  L)  to  reduce  from. 

2.  Define  the  reduction  R. 

3.  Show  that  the  composition  of  R  with  Oracle  correctly  decides  L\. 

We  make  choices  at  steps  1  and  2.  Our  last  example  showed  that  there  may  be  more 
than  one  reasonable  choice  for  step  2.  There  may  also  be  more  than  one  reasonable 
choice  for  step  1.  So  far.  we  have  chosen  to  reduce  from  H.  But  now  that  we  know 
other  languages  that  are  not  in  D.  we  could  choose  to  use  one  of  them.  We  want  to  pick 
one  that  makes  step  2.  constructing  R.  as  straightforward  as  possible. 
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THEOREM  21.3  "Does  M  Halt  on  Everything?"  is  Undecidable 

Theorem:  The  language  HAU.  =  j <M>  :  Turing  machine  M  halts  on  2*}  is  not 
in  D.  (Note:  HALt  is  also  not  in  SD.  which  we  will  show  in  Section  21.6.) 

Proof:  We  show  that  HE  HALl  and  so  HALL  is  not  in  D.  We  have  chosen  to  use 
He  rather  than  H  because  Ht.  looks  more  like  HAll  than  H  does.  Both  of  them 
contain  strings  composed  of  a  single  Turing  machine  description,  without  refer¬ 
ence  to  a  particular  string  w.  It  is  possible  to  do  this  proof  by  reduction  from  H 
instead.  We  leave  that  as  an  exercise. 

Let  R  be  a  mapping  reduction  from  HB  to  HALL  defined  as  follows: 

R(<M>)  = 

1.  Construct  the  description  <Af#>  of  a  new  Turing  machine  Af#(.v)  that, 
on  input  .v,  operates  as  follows: 

1.1.  Erase  the  tape. 

1.2.  Run  M. 

2.  Return  <JVf#>. 

If  Oracle  exists  and  decides  HA,  L,  then  C  =  Oracle(R(<M>))  decides  He.  R 
can  be  implemented  as  a  Turing  machine.  And  C  is  correct.  Mtt  runs  M  on  e.  It 
halts  on  everything  or  nothing,  depending  on  whether  M  halts  on  e.  So: 

•  If  <A/>  e  Hk:  M  halts  on  e,  so  Af#  halts  on  all  inputs.  Oracle(<M#>)  accepts. 

•  If  <  Af>  g  Hb:  Af  does  not  halt  on  e.  so  Af#  halts  on  nothing.  Oracle(<Mft>) 
rejects. 

But  no  machine  to  decide  Hc  can  exist,  so  neither  does  Oracle. 


Are  safety  and  security  properties  of  complex  systems  decidable?  (J.2) 


We  next  define  a  new  language  that  corresponds  to  the  membership  question  for 
Turing  machines: 

•  A  =  { <  Af.  «’>  :  Turing  machine  M  accepts  w) 

Note  that  A  is  different  from  H.  since  it  is  possible  that  M  halts  but  does  not  accept. 
Accepting  is  a  stronger  condition  than  halting.  An  alternative  definition  of  A  is  then: 

•  A  -  { <  Af ,  w.’>  :  M  is  a  Turing  machine  and  we  L(M)}. 

Recall  that,  lor  finite  state  machines  and  pushdown  automata,  the  membership  ques¬ 
tion  was  decidable.  In  other  words,  there  exists  an  algorithm  that,  given  M  (an  FSM  or  a 
PDA)  and  a  string  u\  answers  the  question.  “Does  M  accept  wT  We’re  about  to  show 
that  the  membership  question  for  Turing  machines  is  undecidable. 
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THEOREM  21.4  "Does  M  accept  w?"  is  Undecidable 

Theorem:  The  language  A  =  { <M.  tn>  :  M  is  aTuring  machine  and  ire  L(M)\  is 
not  in  D. 

Proof:  We  show  that  H  A  and  so  A  is  not  in  D.  Since  H  and  A  are  so  similar,  it 
may  be  tempting  to  define  a  mapping  reduction  R  simply  as  the  identity  function: 

R(<M,w>)  - 

1.  Return  <M.w>. 

But  this  won’t  work,  as  we  see  immediately  when  we  try  to  prove  that  C  ~ 
Oracle{R(<M .  w>))  decides  A: 

•  If  <M.  tt? >  e  H:  M  halts  on  in  .  It  may  either  accept  or  reject.  If  M  accepts  in. 
then  Orucle(<M .  ir>)  accepts.  But  if  M  rejects  u\Oracle[< M .  tr>)  will  reject 

So  we  cannot  guarantee  that  Oracle  will  accept  whenever  M  halls  on  in.  We 
need  to  construct  R  so  that  it  passes  to  Oracle  a  machine  MM  that  is  guaranteed 
both  to  halt  and  to  accept  whenever  M  would  hall  on  in. 

We  can  make  that  happen  by  defining  R .  a  mapping  reduction  from  H  to  A,  as 
follows: 

R(<M%  w>)  = 

1.  Construct  the  description  <MM>  of  a  new  Turing  machine  MM{x)  that, 
on  input  x.  operates  as  follows: 

1.1.  Erase  the  tape. 

1.2.  Write  in  on  the  tape. 

1.3.  Run  M  on  in. 

1.4.  Accept.  /*  This  step  is  new.  It  is  important  since  the  hy¬ 

pothesized  Oracle  will  decide  whether  MM 
accepts  in.  not  just  w  hether  it  halts  on  in. 

2.  Return  <MM,w>.  I*  Note  that  R  returns  not  just  a  description 

of  MM.  It  returns  a  string  that  encodes  both 
MM  and  an  input  string.  This  is  important 
since  any  decider  for  A  will  accept  only 
strings  of  that  form.  We  chose  in  somewhat 
arbitrarily.  We  could  have  chosen  any  other 
siring,  for  example  k. 

If  Oracle  exists  and  decides  A.  then  C  =  Orucle(R(< M,  rn>))  decides  H.  R 
can  be  implemented  as  a  Turing  machine.  And  C  is  correct.  MM  ignores  its  own 
input.  It  accepts  everything  or  nothing,  depending  on  whether  it  makes  it  through 
the  gate  at  step  1.3,  and  thus  to  step  1 .4.  So: 

•  If  <M.  tn>  e  H:  M  halts  on  in.  so  MM  accepts  everything.  In  particular,  it  ac¬ 
cepts  in.  Oracle{<MM,  in>)  accepts. 

•  If  <M.  w>  e  H:  M  does  not  hall  on  in.  MM  gels  stuck  in  step  1.3  and  so  ac¬ 
cepts  nothing.  In  particular,  it  does  not  accept  in.  Oracle(<MM.  w>)  rejects. 

But  no  machine  to  decide  H  can  exist,  so  neither  does  Oracle. 
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We  can  also  define  Ae,  AANy-  and  Aall,  in  a  similar  fashion  and  show  that  they  too 
arc  not  in  D: 


THEOREM  21.5  "Does  M  Accept  e?"  is  Undecidable 

Theorem:  The  language  Ae  =  {<M>  :  Turing  machine  M  accepts  e}  is  not  in  D. 
Proof:  Analogous  to  that  for  He.  It  is  left  as  an  exercise. 

THEOREM  21.6  "Does  M  Accept  Anything?"  is  Undecidable _ 

Theorem:  The  language  Aany  =  {<M>  :  there  exists  at  least  one  string  that  Hir¬ 
ing  machine  M  accepts}  is  not  in  D. 

Proof:  Analogous  to  that  for  HANY.  It  is  left  as  an  exercise. 

The  fact  that  AAny  >s  not  >n  D  means  that  there  exists  no  decision  procedure  for  the 
emptiness  question  for  the  SD  languages.  Note  that,  in  this  respect,  they  are  different  from 
both  the  regular  and  the  context-free  languages,  for  which  such  a  procedure  does  exist. 

THEOREM  21.7  "Does  M  Accept  Everything?"  is  Undecidable 

Theorem:  The  language  AALl  =  { <M>  :  M  is  aTuring  machine  and  L(M)  =  2*} 
is  not  in  D. 

Proof:  Analogous  to  that  for  HALL.  It  is  left  as  an  exercise. 

So  far,  we  have  discovered  that  many  straightforward  questions  that  we  might  like 
to  ask  about  the  behavior  of  Turing  machines  are  undecidable.  It  should  come  as  no 
surprise  then  to  discover  that  the  equivalence  question  for  Turing  machines  is  also  un¬ 
decidable.  Consider  the  language: 

•  EqTMs  =  {<Afa,  Mb> :  Mu  and  Mh  are  Turing  machines  and  L(Ma)  =  L(Mb)}. 

We  will  show  that  EqTMs  is  not  in  D.  How  can  we  use  reduction  to  do  that?  So  far, 
all  the  languages  that  we  know  are  not  in  D  involve  the  description  of  a  single  Turing 
machine.  So  suppose  that  EqTMs  were  decidable  by  some  Turing  machine  Oracle.  How 
could  we  use  Oracle,  which  compares  two  Turing  machines,  to  answer  questions  about  a 
single  machine,  as  we  must  do  to  solve  H  or  A  or  any  of  the  other  languages  we  have 
been  considering?  The  answer  is  that  our  reduction  R  must  create  a  second  machine  M# 
whose  behavior  it  knows  completely.  Then  it  can  answer  questions  about  some  other 
machine  by  comparing  it  to  M#.  We  illustrate  this  idea  in  our  proof  of  the  next  theorem. 


Consider  the  problem  of  virus  detection.  Suppose  that  a  new  virus  V  is  dis¬ 
covered  and  its  code  is  determined  to  be  <V>.  Is  it  sufficient  for  antivirus 
software  to  check  solely  for  occurrences  of  <V>?  (J.4) 
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THEOREM  21.8  "Are  Two  Turing  Machines  Equivalent?"  is  Undecidable 

Theorem:  The  language  EqTMs  =  {<Afa.  Afh>  :  Ma  and  Mh  are  Turing  machines 
and  L(M9)  =  IAM^)\  isnoi  in  D. 

Proof:  We  show  that  AAu.  sM  EqTMs  and  so  EqTMs  is  not  in  D.  Let  R  be  a  map¬ 
ping  reduction  from  Aaj  l  to  EqTMs  defined  as  shown  below.  Since  R  must  in¬ 
voke  Oracle  on  a  pair  of  Turing  machines,  it  will  create  one  new  one.  MU,  which 
can  be  compared  to  Af.The  idea  is  that  MU  will  be  designed  so  that  it  simply  halts 
and  accepts,  whatever  its  input  is.  By  comparing  M  to  MU.  we  can  determine  Af's 
behavior: 

R(<M>)  = 

1.  Construct  the  description  <MU>  of  a  new  Turing  machine  MU(x)  that, 
on  input  jc,  operates  as  follows: 

1.1.  Accept. 

2.  Return  <M.  MU>. 

If  Oracle  exists  and  decides  EqTMs.  then  C  =  Oracle(R(<M>))  decides 
Aai.l-  R  can  be  implemented  as  a  Turing  machine.  And  C  is  correct.  MU  accepts 
everything.  So  if  L(M)  =  L(MU).  M  must  also  accept  everything.  So: 

•  If  <A'/>  e  AAy.:  L(M )  =  L(MU).  Oracle(<M .  MU>)  accepts. 

•  If  <M>  g  AAll:  L(Af)  *  L(MU).  Orucle(<M,  MU>)  rejects. 

But  no  machine  to  decide  AaLI  can  exist,  so  neither  does  Oracle. 


Consider  the  problem  of  grading  programs  that  are  written  as  exercises  in 
programming  classes.  We  would  like  to  compare  each  student  program  to  a 
“correct"  program  written  by  the  instructor  and  accept  those  programs  that 
behave  identically  to  the  one  written  by  the  instructor.  Theorem  21.8  says 
that  a  perfect  grading  program  cannot  exist. 


We  should  point  out  here  that  EqTMs  is  not  only  not  in  D.  it  is  also  not  in  SD.  We 
leave  the  proof  of  that  as  an  exercise. 


21.2.3  Reductions  That  are  Not  Mapping  Reductions 

The  general  definition  of  reducibility  that  we  provided  at  the  beginning  of  Section  21.2 
is  strictly  more  powerful  than  the  more  restricted  notion  of  mapping  reducibility  that 
we  have  used  in  the  examples  that  we  have  considered  mi  far.  We'll  next  consider  a  case 
where  no  mapping  reduction  exists,  but  a  more  general  one  docs.  Recall  that  the  more 
general  definition  of  a  reduction  from  L\  to  may  consist  of  two  or  more  functions 
that  can  be  composed  to  decide  L\  if  Oracle  exists  and  decides  /.2.  We'll  see  that  one 
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particularly  useful  thing  to  do  is  to  exploit  a  second  function  that  applies  to  the  output 
of  Oracle  and  flips  it  (i.e.,  it  turns  an  Accept  into  a  Reject  and  vice  versa). 


THEOREM  21.9  "Does  M  Accept  No  Even  Length  Strings?"  is  Undecidable 

Theorem:  The  language  L2  =  {<M>  :  Turing  machine  M  accepts  no  even  length 
strings)  is  not  in  D. 

I 

Proof:  We  show  that  H  s  L2  and  so  L2  is  not  in  D.  As  in  the  other  examples  we 
have  considered  so  far,  we  need  to  define  a  reduction  from  H  to  L2.  But  this 
lime  we  are  going  to  run  into  a  glitch.  We  can  try  to  implement  a  straightfor¬ 
ward  mapping  reduction  R  between  H  and  L2.  just  has  we  have  done  for  our 
other  examples.  But.  when  we  do  that  and  then  pass  its  result  to  Oracle,  we’ll 
see  that  Oracle  will  return  the  opposite  of  the  answer  we  need  to  decide  H.  But 
that  is  an  easy  problem  to  solve.  Since  Oracle  is  (claimed  to  be)  a  deciding  ma¬ 
chine,  it  always  halts.  So  we  can  add  to  the  reduction  a  second  Turing  machine 
that  runs  after  Oracle  and  just  inverts  Oracle's  response.  We’ll  call  that  second 
machine  simply  Define: 

R(<M.  w> )  = 

1.  Construct  the  description  <Af#>  of  a  new  Turing  machine  A/#(.r)  that, 

on  input  x ,  operates  as  follows: 

1.1.  Erase  the  tape. 

1.2.  Write  w  on  the  tape. 

1.3.  Run  M  on  «>. 

1.4.  Accept. 

2.  Return  <A/#>. 

{/?,->)  is  a  reduction  from  H  to  L2.  If  Oracle  exists  and  decides  L2 ,  then 
C  =  -iOrucle{R(<M .  «'>))  decides  H.  R  and  -i  can  be  implemented  as  aTuring 
machines.  And  C  is  correct.  Af#  ignores  its  own  input.  It  accepts  everything  or 
nothing,  depending  on  whether  it  makes  it  to  step  1.4.  So: 

•  If  <M ,  tv>  e  H:  M  halts  on  tv,  so  Af#  accepts  everything,  including  some  even 
length  strings.  Oractc(<M#> )  rejects  so  C  accepts. 

•  If  <  M .  w>  e  H :  does  not  halt  on  w.  A/#  gets  stuck  in  step  1 .3  and  accepts  nothing, 
and  so,  in  particular,  no  even  length  strings.  Orm7r(<MI>)  accepts.  So  C  rejects. 

But  no  machine  to  decide  H  can  exist,  so  neither  does  Oracle .  So  L2  is  not  in  D. 
It  is  also  not  in  SD.  We  leave  the  proof  of  that  as  an  exercise. 

We  have  just  shown  that  there  exists  a  reduction  from  H  to  l.2  =  ( <M>  :  Turing 
machine  M  accepts  no  even  length  strings}.  It  is  possible  to  prove  that  this  is  a  case 
where  it  was  necessaiy  to  exploit  the  greater  power  offered  by  the  more  general 
definition  of  reducibility.  We  leave  as  an  exercise  the  proof  that  no  mapping  reduc¬ 
tion  from  11  to  L2  exists. To  see  why  this  might  be  so,  it  is  important  to  keep  in  mind 
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the  definition  of  mapping  reducibilily:  L\  <M  1.2  iff  there  exists  some  computable 
function /such  that: 

V.vf.v  e  L\  iff  /(.r)  e  /.;). 

Note  that,  if  such  an /exists,  it  is  also  a  mapping  reduction  from  -L(  to  ->Ly- 


21.3  Are  All  Questions  About  Turing  Machines  Undecidable? 

By  now  it  should  be  clear  that  many  interesting  properties  of  the  behavior  of  Turing 
machines  are  undecidable.  Is  it  true  that  any  question  that  asks  about  a  Turing  machine 
or  its  behavior  is  undecidable?  No. 

First,  we  observe  that  questions  that  ask  just  about  a  Turing  machine's  physical  struc¬ 
ture,  rather  than  about  its  behavior,  are  likely  to  be  decidable. 


EXAMPLE  21.8  The  Number  of  States  of  M  is  Decidable 

Let  La  =  { <M>  :  Turing  machine  M  contains  an  even  number  of  states}.  LA  is 
decidable  by  the  following  procedure: 

1.  Make  a  pass  through  <A/>.  counting  the  number  ol  stales  in  M. 

2.  If  even,  accept;  else  reject. 


Next  we’ll  consider  two  questions  that  do  ask  about  a  Turing  machine's  behavior  but 
are.  nevertheless,  decidable. 

EXAMPLE  21.9  Whether  M  Halts  in  Some  Fixed  Time  is  Decidable 

Let  Lb  =  {<M.  w>  :  Turing  machine  M  halts  on  w  within  3  steps}.  La  is  decid¬ 
able  by  the  following  procedure: 

1.  Simulate  M  for  3  steps. 

2.  If  it  halted,  accept;  else  reject. 


EXAMPLE  21.10  Exactly  How  M  Works  May  be  Decidable 

Let  Lc  =  { <  M,  w>  ;  Turing  machine  M  moves  right  exactly  twice  while  running 
on  «>}.  6 

Notice  that  M  must  move  either  to  the  right  or  the  left  on  each  move.  We 
make  the  usual  assumption  that  M\  read/wrilo  head  is  positioned  immediately 
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to  the  left  of  the  leftmost  input  character  when  M  starts.  If  M  cannot  move  right 
more  than  twice,  it  can  read  no  more  than  two  characters  of  its  input.  But  it  may 
loop  forever  moving  left.  As  it  moves  left,  it  can  write  on  the  tape,  but  it  cannot 
go  back  more  than  two  squares  to  read  what  it  has  written.  So  the  only  part  of  the 
tape  that  can  affect  M's  future  behavior  is  the  current  square,  two  squares  to  the 
right  and  two  squares  to  the  left  (since  all  other  squares  to  the  left  still  contain  □). 
Let  K  be  the  set  of  states  of  M  and  let  T  be  M's  tape  alphabet.  Then  the  number 
of  effectively  distinct  configurations  of  M  is  maxconfigs  =  \K,m\  •  |rw|*\  If  we 
simulate  M  running  for  maxconfigs  moves,  it  will  have  entered,  at  least  once, 
each  configuration  that  it  is  ever  going  to  reach.  If  it  has  not  halted,  then  it  is  in 
an  infinite  loop.  Each  time  through  the  loop  it  will  do  the  same  thing  it  did  the 
lust  time. 

If,  in  simulating  maxconfigs  moves,  M  moved  right  more  than  twice,  we  can  re¬ 
ject.  If  it  did  not  move  right  at  all.  or  if  it  moved  right  once,  we  can  reject.  If  it 
moved  right  twice,  we  need  to  find  out  whether  either  of  those  moves  occurred 
during  some  loop.  We  can  do  that  by  running  M  for  up  to  maxconfigs  more  moves. 
In  the  extreme  case  of  a  maximally  long  loop,  it  will  move  right  once  more.  If 
there  is  a  shorter  loop.  M  may  move  right  several  times  more.  So  the  following 
procedure  decides  Lc: 

1.  Run  M  on  w  for  \Km\  •  |rM|s  moves  or  until  M  halts  or  moves  right  three  times: 

1.2.  If  M  moved  right  exactly  twice,  then: 

Run  M  on  w  for  another  |A',„|  •  |T M|s  moves  or  until  it  moves  right. 

If  M  moved  right  any  additional  times,  reject;  otherwise  accept. 

1.3.  If  M  moved  right  some  other  number  of  times,  reject. 


What  is  different  about  languages  such  as  LA,  /,B.  and  Lc  (in  contrast  to  H,  He, 
H any-  Mali.'  an^  l*1e  olher  languages  we  have  proven  are  not  in  D)7  The  key  is  that, in 
the  case  of  LA,  the  question  is  not  about  M's  behavior  at  all.  It  involves  just  its  struc¬ 
ture.  In  the  case  of  Ln  and  /.< ,  the  question  we  must  answer  is  not  about  the  language 
that  the  Turing  machine  M  halls  on  or  accepts.  It  is  about  a  detail  of  M's  behavior  as  it 
is  computing.  In  the  case  of  Lu.  it  has  to  do  with  the  exact  number  of  steps  in  which  M 
might  hall.  In  the  case  of  L(%  it  is  about  the  way  that  M  goes  about  solving  the  problem 
(specilically  how  olten  it  moves  right).  It  turns  out  that  questions  like  those  can  be  de¬ 
cided.  We  II  see.  though,  in  Section  2 1 .5.  that  we  must  be  careful  about  this.  Some  ques¬ 
tions  that  appear  to  be  about  the  details  of  how  M  operates  can  be  recast  as  questions 
about  M's  output  and  so  are  not  decidable. 


Rice's  Theorem,  which  we  present  next,  articulates  the  difference  between  languages 
like  H  and  languages  like  LA,  LB,  and  Lc. 
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21.4  Rice's  Theorem  * 

Consider  the  set  SD  of  semidecidable  languages.  Suppose  that  we  want  to  ask  any  of 
the  following  questions  about  some  language  L  in  that  set: 

•  Does  L  contain  some  particular  siring  u>? 

•  Does  L  contain  e? 

•  Does  L  contain  any  strings  at  all? 

•  Does  L  contain  all  strings  over  some  alphabet  2? 

In  order  to  consider  building  a  program  to  answer  any  of  those  questions,  we  first 
need  a  way  to  specify  formally  what  L  is.  Since  the  SD  languages  are,  by  definition,  ex¬ 
actly  the  languages  that  can  be  semidecided  by  some  Turing  machine,  one  way  to  spec¬ 
ify  a  language  L  is  to  give  a  semideciding  Turing  machine  for  it.  If  we  do  that,  then  we 
can  restate  each  of  those  questions  as: 

•  Given  a  semideciding  Turing  machine  AY,  does  AY  accept  some  particular  string  wl 

•  Given  a  semideciding  Turing  machine  AY,  does  AY  accept  e? 

•  Given  a  semideciding  Turing  machine  AY.  does  AY  accept  anything? 

•  Given  a  semideciding  Turing  machine  AY.  does  AY  accept  all  strings  in  21*? 

We  can  encode  each  of  those  decision  problems  as  a  language  to  be  decided,  yielding: 

•  A  =  {<AY.  w>  '.Turing  machine  AY  accepts  »’}. 

•  Ac  =  {<AY>  :  Turing  machine  AY  accepts  e}. 

•  Aany  =  {<AY>  :  there  exists  at  least  one  string  that  Turing  machine  AY  accepts}. 

•  AA|.l  =  {<AY>  :  Turing  machine  AY  accepts  all  inputs}. 

We  have  already  seen  that  none  of  these  languages  is  in  D,so  none  of  the  correspon¬ 
ding  questions  is  decidable.  Rice's  Theorem,  which  we  are  about  to  state  and  prove,  tells 
us  that  not  only  these  languages,  but  any  language  that  can  be  described  as  { <AY>:  P 
(/.(AY))  =  T  rue } ,  for  any  nontrivial  property  P.  is  not  in  D.  By  a  nontrivial  property  we 
mean  a  property  that  is  not  simply  True  for  all  languages  or  False  for  all  languages. 

But  we  can  state  Rice's  Theorem  even  more  generally  than  that.  The  questions  we 
have  just  considered  are  questions  we  can  ask  of  any  semidecidable  language,  inde¬ 
pendently  of  how  we  describe  it.  We  have  used  semideciding  Turing  machhnes  as  our 
descriptions.  But  we  could  use  some  other  descriptive  form.  (For  example,  in  Chapter 
23,  we  will  consider  a  grammar  formalism  that  describes  exactly  the  SD  languages.) 
The  key  is  that  the  property  wc  are  evaluating  is  a  property  of  the  language  itself  and 
not  a  property  of  some  particular  Turing  machine  that  happens  to  semidecide  it. 

So  an  alternative  way  to  state  Riae’s  Theorem  is: 

No  nontrivial  property  of  the  SD  languages  is  decidable. 

Just  as  languages  that  are  defined  in  terms  of  the  behavior  ofTuring  machines 
are  generally  not  decidable,  functions  that  describe  the  way  that  Turing  ma¬ 
chines  behave  are  likely  not  to  be  computable.  See.  for  example,  the  busy 
heaver  functions  described  in  Section  25.1.4. 


21.4  Rice's  Theorem  469 


To  use  Rice’s  Theorem  lo  show  that  a  language  L  of  the  form  {<M>:  P(L(M )) 

=  T me }  is  not  in  D  we  must: 

•  Specify  property  P. 

•  Show  that  the  domain  of  P  is  the  set  of  SD  languages. 

•  Show  that  P  is  nontrivial: 

•  P  is  true  of  at  least  one  language. 

•  P  is  false  of  at  least  one  language. 

Let  M,  Ma.  and  Mh  be  Turing  machines.  We’ll  consider  each  of  the  following  lan¬ 
guages  and  see  whether  Rice’s  Theorem  applies  to  it: 

1.  {<M> :  M  is  a TUring  machine  and  L{M)  contains  only  even  length  strings}. 

2.  {<M> :  M  is  a  Turing  machine  and  L(M)  contains  an  odd  number  of  strings}. 

3.  {< M>  :  M  is  a  Timing  machine  and  L(M)  contains  all  strings  that  start  with  a}. 

4.  {<M>  :  M  is  a  Turing  machine  and  L(M)  is  infinite}. 

5.  \<M>  :  M  is  a  Turing  machine  and  L(M)  is  regular}. 

6.  {<M>  :  Turing  machine  M  contains  an  even  number  of  states}. 

7.  \<M> :  Turing  machine  M  has  an  odd  number  of  symbols  in  its  tape  alphabet}. 

8.  {<M>  :  Turing  machine  M  accepts  e  within  100  steps}. 

9.  {<M>  :  Turing  machine  M  accepts  e}. 

10.  {<Ma.  Mb>:  Ma  and  Mb  are  Turing  machines  and  L(Ma)  =  L(Mh)}. 

In  cases  1  through  5,  we  can  easily  stale  P.  For  example,  in  case  1,  P  is  “True  if  L  con¬ 
tains  only  even  length  strings  and  False  otherwise”.  In  all  five  cases,  the  domain  of  P  is 
the  set  of  SD  languages  and  P  is  nontrivial.  For  example, in  case  l.Tis  True  of  {aa,bb} 
and  False  of  { a,  aa } . 

But  now  consider  cases  6  through  8.  In  case  6,  P  is, “True  if  M  has  an  even  number  of 
states  and  False  otherwise".  P  is  no  longer  a  property  of  a  language.  It  is  a  property  of 
some  specific  machine,  independent  of  the  language  that  the  machine  accepts.  The 
same  is  true  in  cases  7  and  8.  So  Rice's  Theorem  tells  us  nothing  about  whether  those 
languages  are  in  D.They  may  or  may  not  be.  As  it  turns  out.  all  three  of  these  examples 
are  in  D  and  languages  that  look  like  them  usually  are.  But  Rice’s  Theorem  does  not 
tell  us  that.  It  simply  tells  us  nothing. 

Next  consider  case  9.  In  form,  it  looks  something  like  case  8.  But  it  is  in  fact  like  1-5. 
An  alternative  way  to  state  P  is,  “ee  L(M )",  It  is  not  the  wording  of  the  description 
that  matters.  It  is  the  properly  P  itself  that  counts. 

Finally  consider  case  10.  We  have  already  shown  that  this  language,  which  we  have 
named  EqTMs,  is  not  in  D.  But  Rice’s  Theorem  does  not  tell  us  that.  Again,  it  says 
nothing  since  now  we  are  asking  about  a  property  whose  domain  is  SD  X  SD.  rather 
than  simply  SD.  So  when  Rice  s  Theorem  doesn’t  apply,  it  is  possible  that  we  are  deal¬ 
ing  with  a  languttge  in  D.  It  is  also  possible  we  are  dealing  with  one  not  in  D.  Without 
additional  investigation,  we  just  don’t  know. 
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Rico's  Theorem  is  not  going  to  give  us  a  way  to  prove  anything  we  couldn't  have 
proven  with  reduction.  Although  it  is  an  alternative  proof  strategy,  its  main  value  is  its 
insight.  We  know  immediately,  when  confronted  with  a  question  about  the  SD  lan¬ 
guages.  that  it  will  not  be  decidable. 

The  proof  of  Rice’s  Theorem  is  by  reduction  from  H.  It  is  a  bit  more  complex  than 
any  of  the  reductions  we  have  done  so  far.  but  the  principle  is  exactly  the  same.  What 
we  are  going  to  do  is  to  show  that  if  it  were  possible  to  decide  any  properly  P  (without 
regard  to  what  P  is  except  that  it  is  nontrivial),  then  it  would  he  possible  to  decide  H.  It 
may  seem  surprising  that  we  can  show  this  without  appeal  to  any  information  about 
what  P  tells  us.  But  we  can. 


THEOREM  21.10  Rice's  Theorem 

Theorem:  For  any  nontrivial  P.  the  language  L  =  \<M>:  P(L(M))  =  True)  is 
not  in  D. 

Proof:  We  prove  Rice's  Theorem  by  showing  that  H  L.  Let  P  be  any  nontrivial 
property  of  the  SD  languages.  We  do  not  know  what  P  is.  But,  whatever  it  is,  ei¬ 
ther  P(0)  =  True  or  P(0)  =  False.  Assume  it  is  Ful.se.  We  leave  the  proof  that 
the  theorem  holds  if  />(0)  =  True  as  an  exercise.  Since  P  is  nontiivial.  there  is 
some  SD  language  Lx  such  that  P(Ly)  is  True.  Since  Lx  is  in  SD.  there  exists 
some  Turing  machine  K  that  semidecides  it. 

We  need  to  define  a  mapping  reduction  R  from  H  to  L.The  main  idea  in  this 
reduction  is  that  R  will  build  a  machine  M M  that  first  runs  M  on  w  as  a  sort  of  fil¬ 
ter.  If  it  makes  it  by  that  step,  then  it  considers  its  own  input  and  either  accepts  it 
or  not.  If  we  can  design  MM' s  action  at  that  second  step  so  that  we  can  tell 
whether  it  makes  it  there,  we  will  know  whether  M  halted  on  ir. 

Let  R  be  a  reduction  from  H  to  L  defined  as  follows: 

R(<M.  MJ>)  = 

1.  Construct  the  description  <MM>  of  a  new  Turing  machine  MM(x)  that. 

on  input  .r,  operates  as  follows: 

1.1.  Copy  its  input  .v  to  a  second  tape. 

1.2.  Erase  the  tape. 

1.3.  Write  tr  on  the  tape. 

1.4.  Run  M  on  w. 

1.5.  Put  x  back  on  the  first  tape  and  run  K  (the  Turing  machine  that 
semidecides  Lj,  a  language  of  which  P  is  True)  on  .r. 

2.  Return  <MM>. 

If  Oracle  exists  and  decides  L.  then  C  =  Oracle(R(<M.  w>))  decides  H.  R 
can  be  implemented  as  a  Turing  machine.  And  C  is  correct: 

•  If  <M.  tc>  e  H:  M  halts  on  u\  so  MM  makes  it  to  step  1 .5.  So  MM  docs  what¬ 
ever  K  would  do.  So  L(MM)  =  L(K)  and  P(L(MM))  =  P(L( K)).  We  chose  K 
precisely  to  assure  that  P(L(K))  is  True,  so  P(L(MU))  must  also  be  True. 
Oracle  decides  P.  Orucle(<MM> )  accepts. 


21.4  Rice's  Theorem  471 


•  lf<M,  w>  9.  H:  M  does  not  halt  on  w.  M#  gets  stuck  in  step  1.4  and  so  ac¬ 
cepts  nothing.  L(iW#)  =  0.  By  assumption.  P(0)  =  False.  Oracle  decides  P. 
Oracle(<M#>)  rejects. 

But  no  machine  to  decide  H  can  exist,  so  neither  does  Oracle. 

Now  that  we  have  proven  the  theorem,  we  can  use  it  as  an  alternative  to  reduction 
in  proving  that  a  language  L  is  not  in  D. 


THEOREM2l.il  "Is  L(M)  Regular?"  is  Undecidable  _ 

Theorem:  Given  a  Turing  machine  M,  the  question,  “Is  L{M)  regular?”  is  not  de¬ 
cidable.  Alternatively,  the  language  TMREG  =  {<M>  :  M  is  a  Turing  machine 
and  L(M)  is  regular}  is  not  in  D. 

Proof:  By  Rice’s  Theorem.  We  define  Pas  follows: 

•  Let  P  be  defined  on  the  set  of  languages  accepted  by  some  Turing  machine  M. 
Let  it  be  True  if  L(M)  is  regular  and  False  otherwise. 

•  The  domain  of  P  is  the  set  of  SD  languages  since  it  is  the  set  of  languages  ac¬ 
cepted  by  some  Turing  machine. 

•  P  is  nontrivial: 

•  P(a*)  =  True. 

•  P(AnBn)  =  False. 

Thus  we  can  conclude  that  { <M> :  M  is  aTuring  machine  and  L{M)  is  regular} 
is  not  in  D. 

We  can  also  prove  this  by  reduction  from  H.The  reduction  we  will  use  exploits 
two  functions,  R  and  R  will  map  instances  of  H  to  instances  of  TMREG.  It  will 
use  a  strategy  that  is  very  similar  to  the  one  we  used  in  proving  Rice’s  Theorem. 
As  in  the  proof  of  Theorem  21.9,  — i  will  simply  invert  Oracle's  response  (turning 
an  Accept  into  a  Reject  and  vice  versa).  So  define: 

P(<M, «»)  = 

1.  Construct  the  description  <M#>  of  a  new  Hiring  machine  M#(.t)  that, 
on  input  x ,  operates  as  follows: 

1.1.  Copy  its  input  .v  to  a  second  tape. 

1.2.  Erase  the  tape. 

1.3.  Write  w  on  the  tape. 

1.4.  Run  M  on  w. 

1.5.  Put  x  back  on  the  first  tape. 

1.6.  If  .v  e  AnBn  then  accept,  else  reject. 

2.  Return  <MK>. 

{ P,  -1}  is  a  reduction  from  H  to  TMREG.  If  Oracle  exists  and  decides  TMREG, 
then  C  \Orutle(R(<M,w>))  decides  H.  R  and  -»  can  be  implemented  as 
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Turin}!  machines.  In  particular,  it  is  straightforward  to  build  a  Turing  machine 
that  decides  whether  a  string*  is  in  AnBn.  And  C  is  correct: 

•  If  <M.  w>  e  H:  M  halts  on  a\ so  A/#  makes  it  to  step  1 .5. Then  it  accepts*  iff 
*  e  AnB".  So  Af#  accepts  A"Bn.  which  is  not  regular.  Oracle{<M#>)  rejects. 
C  accepts. 

•  If  <M.  w>  9  H:  M  does  not  halt  on  u\  gets  stuck  in  step  1 .4  and  so  accepts 
nothing.  L{M #)  =  0.  which  is  regular.  Oracle(<MU>)  accepts.  C  rejects. 

But  no  machine  to  decide  H  can  exist,  so  neither  does  Oracle. 

It  turns  out  that  wc  can  also  make  a  stronger  statement  about  TMK|:tj.  It  is  not  only 
not  in  D,it  is  not  in  SD.  We  leave  the  proof  of  that  as  an  exercise. 


21.5  Undecidable  Questions  About  Real  Programs 

The  real  practical  impact  of  the  undecidability  results  that  we  have  just  presented  is  the 
following:  The  programming  environments  that  we  actually  use  every  day  are  equal  in 
computational  power  to  the  Turing  machine.  So  questions  that  are  undecidable  when 
asked  about  Turing  machines  are  equally  undecidable  when  asked  about  Java  pro¬ 
grams  or  C++  programs,  or  whatever.  The  undecidabilily  of  a  question  about  real  pro¬ 
grams  can  be  proved  by  reduction  from  the  corresponding  question  about  Turing 
machines.  We’ll  show  one  example  of  doing  this. 

THEOREM  21.12  "Are  Two  Programs  Equivalent?"  is  Undecidable 

Theorem:  The  language  Eq Programs  =  {</*,  P„>  :  />  and  /’h  are  programs  in 
any  stundard  programming  language  PL  and  L(PH)  -  l.(Piy) }  is  not  in  D. 

Proof:  Recall  that  EqTMs  =  { <Ma .  Afb>  :  A/;l  and  A/|,  are  Turing  machines  and 
L(MJ  =  L(Mh)).  We  show  that  EqTMs  <M  Eq  Programs  and  so  EqPrograms  is 
not  in  D  (since  EqTMs  isn’t).  It  is  straightforward  to  build,  in  any  standard  pro¬ 
gramming  language,  an  implementation  of  the  universal  luring  machine  U.  Call 
that  program  SimUM.  Now  let  R  be  a  mapping  reduction  from  EqTMs  to  EqPro¬ 
grams  defined  as  follows: 

tf«Ma,A/b>)  = 

1.  Build  P|.  a  PL  program  that,  on  input  w.  invokes  ShnUM(Ma .  w)  and 
returns  its  result. 

2.  Build  P2.  a  PL  program  that,  on  input  ir.  invokes  SimUM(Mh ,  t<*)  and 
returns  its  result. 

3.  Return  <PhP2>. 

If  Oracle  exists  and  decides  EqPrograms,  then  C  =  Oraclc(R(< A/v  Mh>)) 
decides  F.qTMs.  R  can  be  implemented  as  a  Turing  machine.  And  C  is  correct 
L(Pi)  =  L(Mn)  and  L(P2)  =  L(Afh).  So: 
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•  If  <Ma.  Mh>  e  EqTMs:  L(Ma)  =  L(Mh).  So  L(P,)  =  L(P2).  Oracle(<P\ , 
P2>)  accepts. 

•  If  <Af;i,  Mtl>g  EqTMs:  L(Ma)  ^  L(Mh).  So  Z-< Pj)  *  L(P2).  Oracle(<Pi, 
P2>)  rejects. 

But  no  machine  to  decide  EqTMs  can  exist,  so  neither  does  Oracle. 


The  United  States  Patent  Office  issues  patents  on  software.  But,  before  the 
Patent  Office  can  issue  any  patent,  it  must  check  for  prior  art.  The  theorem 
we  have  just  proved  suggests  that  there  can  exist  no  general  purpose  pro¬ 
gram  that  can  do  that  checking  automatically. 


Because  the  undecidability  of  questions  about  real  programs  follows  from  the  unde¬ 
cidability  of  those  questions  for  Turing  machines,  we  can  show,  for  example,  that  all  of 
the  following  questions  are  undecidable: 

1.  Given  a  program  P  and  input  .v,  does  P.  when  running  on  x,  halt? 

2.  Given  a  program  P,  might  P  get  into  an  infinite  loop  on  some  input? 

3.  Given  a  program  P  and  input  x,  does  P,  when  running  on  x ,  ever  output  a  0?  Or 
anything  at  all? 

4.  Given  two  programs.  P|  and  P2. ,  are  they  equivalent? 

5.  Given  a  program  P,  input  .v,  and  a  variable  n,  does  P,  when  running  on  .r,  ever  as¬ 
sign  a  value  to  //?  We  need  to  be  able  to  answer  this  question  if  we  want  to  be  able 
to  guarantee  that  every  variable  is  initialized  before  it  is  used. 

6.  Given  a  program  P  and  code  segment  S  in  P,  does  P  ever  reach  S  on  any  input  (in 
other  words,  can  we  chop  S  out)? 

7.  Given  a  program  P  and  code  segment  5  in  P,  does  P  reach  5  on  every  input  (in 
other  words,  can  we  guarantee  that  S  happens)? 

We've  already  proved  that  questions  1,  2,  and  4  are  undecidable  for  Hiring  ma¬ 
chines.  Question  3  (about  printing  0)  is  one  that  Turing  himself  asked  and  showed  to  be 
undecidable.  We  leave  that  proof  as  an  exercise. 


Is  it  possible  to  build  a  program  verification  system  that  can  determine,  given 
an  arbitrary  specification  S  and  program  P  whether  or  not  P  correctly  imple¬ 
ments  5?  (H.l) 


But  what  about  questions  5,  6,  and  7?  They  appear  to  be  about  details  of  how  a 
program  operates,  rather  than  about  the  result  of  running  the  program  (i.e.,  the  lan¬ 
guage  it  accepts  or  the  function  it  computes).  We  know  that  many  questions  of  that 
sort  arc  decidable,  either  by  inspecting  the  program  or  by  running  it  for  some 
bounded  number  of  steps.  So  why  are  these  questions  undecidable?  Because  they 
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cannot  be  answered  either  by  inspection  or  by  bounded  simulation.  We  can  prove 
that  each  of  them  is  undecidable  by  showing  that  some  language  that  we  already 
know  is  not  in  D  can  be  reduced  to  it.  To  do  this,  we'll  return  to  the  Turing  machine 
representation  lor  programs.  We’ll  show  that  question  6  is  undecidable  and  leave 
the  others  as  exercises. 


Can  a  compiler  check  lor  dead  code  and  eliminate  it?  (G.4.3) 


THEOREM  21.13  "Does  M  Ever  Reach  Some  Particular  State?" 
is  Undecidable 


Theorem:  The  language  L  ~  { <M,q> :  Turing  machine  M  reaches  state  q  on 
some  input }  is  not  in  D. 

Proof:  We  show  that  HANY  L  and  so  L  is  not  in  D.  Let  R  be  a  mapping  reduc¬ 
tion  from  HANY  to  L  defined  as  follows: 

R{<M>)  = 

1.  From  <M>,  construct  the  description  <MU>  of  a  new  Turing  machine 
MU  that  will  be  identical  to  M  except  that,  if  M  has  a  transition  ((</,,  e,). 
(</:>.  c2,  a))  and  q2  is  a  halting  slate  other  than  li.  replace  that  transition 
with  ((</,,  ci),  (/»,  c2,  a)). 

2.  Return  <MU.  Ii>. 

If  Oracle  exists  and  decides  L,  then  C  =  Oracle(R(<M>))  decides  HANY.  R 
can  be  implemented  as  a  Turing  machine.  And  C  is  correct:  MU  will  reach  the  halt¬ 
ing  state  h  iff  M  would  reach  some  halting  state.  So: 

•  If  <M>  e  HANY:  There  is  some  string  on  which  M  halts.  So  there  is  some 
string  on  which  MU  reaches  state  h.  Oracle(<MU ,  h>)  accepts. 

•  If  <M>  <f.  Hany:  There  is  no  string  on  which  M  halts.  So  there  is  no  string  on 
which  MU  reaches  state  h.  Oracle(<MU,  h>)  rejects. 

But  no  machine  to  decide  HANY  can  exist,  so  neither  does  Oracle. 


21.6  Showing  That  a  Language  is  Not  Semidecidable 

We  know,  from  Theorem  20.3,  that  there  exist  languages  that  are  not  in  SD.  In  fact,  we 
know  that  there  are  uncountably  many  of  them.  And  we  have  seen  one  specific  exam¬ 
ple.  --H.  In  this  section  we  will  see  how  to  show  that  other  languages  are  uiso  not  in  SD 
Although  we  will  first  discuss  a  couple  of  other  methods  of  proving  that  a  language  is 
not  in  SD  fwhich  we  will  also  write  as  in  -.SD),  we  will  again  make  extensive  use  of  re¬ 
duction.  This  time,  the  basis  for  our  reduction  proofs  will  be  -,H.  We  will  show  that  if 
some  new  language  L  were  in  SD,  -.H  would  also  be.  But  it  is  not. 
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Before  we  try  to  prove  that  a  language  L  is  not  in  SD  (or  that  it  is),  we  need  an  intu¬ 
ition  that  tells  us  what  to  prove.  A  good  way  to  develop  such  an  intuition  is  to  think 
about  trying  to  write  a  program  to  solve  the  problem.  Languages  that  are  not  in  SD 
generally  involve  either  infinite  search,  or  knowing  that  a  Turing  machine  will  infinite 
loop,  or  both.  For  example,  the  following  languages  are  not  in  SD: 

•  -«H  =  {<M.  w>  :  Turing  machine  M  does  not  halt  on  w}.  To  solve  this  one  by 
simulation,  we  would  have  to  run  M  forever. 

•  {<M>  :  L(M)  =  £*}.  To  solve  this  one  by  simulation,  we  would  have  to  try  all 
strings  in  S*.  But  there  arc  infinitely  many  of  them. 

•  {<M>  :  there  does  not  exist  a  string  on  which  Turing  machine  M  halls}.  To  solve 
this  one  by  simulation,  we  would  have  to  try  an  infinite  number  of  strings  and  show 
that  all  of  them  fail  to  halt.  Even  to  show  that  one  fails  to  halt  would  require  an  in¬ 
finite  number  of  steps. 

In  the  rest  of  this  section,  we  present  a  collection  of  techniques  that  can  be  used  to 
prove  that  a  language  is  not  in  SD. 


.6.1  Techniques  Other  Than  Reduction  ♦ 

Sometimes  we  can  show  that  a  language  L  is  not  in  SD  by  giving  a  proof  by  contradic¬ 
tion  that  does  not  exploit  reduction.  We  will  show  one  such  example  here.  For  this  ex¬ 
ample,  we  need  to  make  use  of  a  theorem  that  we  will  prove  in  Section  25.3:  The 
recursion  theorem  tells  us  that  there  exists  a  subroutine,  obtainSelf ,  available  to  any 
Turing  machine  M,  that  constructs  <M>,  the  description  of  M. 

We  have  not  so  far  said  anything  about  minimizing  Turing  machines.  The  reason  is  that 
no  algorithm  to  do  so  exists.  In  fact,  given  a  Turing  machine  M,  it  is  undecidable  whether 
M  is  minimal.  Alternatively,  the  language  of  descriptions  of  minimal  Turing  machines  is 
not  in  SD.  More  precisely,  define  a  Turing  machine  M  to  be  minimal  iff  there  exists  no 
other  Turing  machine  M'  such  that  \<M’>\  <  |<M>|  and  M‘  is  equivalent  to  M. 

THEOREM  21.14  "Is  M  Minimal?"  is  Not  Semidecidable 

Theorem:  The  language  TM^in  =  { <M> :  Turing  machine  M  is  minimal}  is  not 
in  SD. 

Proof:  If  TMmin  were  in  SD.  then  (by  Theorem  20.8)  there  would  exist  some  Tur¬ 
ing  machine  ENUM  that  enumerates  its  elements.  Define  the  following  Turing 
machine: 

M#(.v)  = 

1.  Invoke  obtainSelf  to  produce  <M  #>. 

2.  Run  ENUM  until  it  generates  the  description  of  some  Turing  machine 
M '  whose  description  is  longer  than  l<  Af#>|. 

3.  Invoke  the  universal  Turing  machine  U  on  the  string  <M\  x>. 
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Since  TMmin  is  infinite,  ENUM  must  eventually  generate  a  string  that  is 
longer  than  |<Af#>|.  So  M#  makes  it  to  step  3  and  so  is  equivalent  to  M‘  since  it 
simulates  M'.  But,  since  l <M#> |  <  |<AT>|,  M'  cannot  he  minimal.  Yet  it  was 
generated  by  ENUM.  Contradiction. 

Another  way  to  prove  that  a  language  is  not  in  SD  is  to  exploit  Theorem  20.6,  which 
tells  us  that  a  language  L  is  in  D  iff  both  it  and  its  complement,  ->L,  are  in  SD.This  is 
true  because,  ir  we  could  semidecide  both  L  and  -.L,  we  could  run  the  two  semideciders 
in  parallel,  wait  until  one  of  them  halts,  and  then  either  accept  (if  the  semidecider  for  L 
accepted)  or  reject  (if  the  semidecider  for  -<L  accepted). 

So  suppose  that  we  are  considering  some  language  L.  We  want  to  know  whether  L 
is  in  SD  and  we  already  know: 

•  -iL  is  in  SD,  and 

•  at  least  one  of  L  or  ~>L  is  not  in  D. 

Then  we  can  conclude  that  L  is  not  in  SD,  because,  if  it  were,  it  would  force  both  it- 
selT  and  its  complement  into  D,  and  we  know  that  cannot  be  true. This  is  the  technique 
that  we  used  to  prove  that  -.H  is  not  in  SD.  We  can  use  it  for  some  other  languages  as 
well,  which  we  will  do  in  our  proof  of  the  next  theorem. 

THEOREM  21.15  "Does  There  Exist  No  String  On  Which  M  Halts?" 
is  Not  Semidecidable 


Theorem:  H-,ANY  =  { <M>  :  there  does  not  exist  any  string  on  which  Turing  ma¬ 
chine  M  halls}  is  not  in  SD. 

Proof:  Recall  that  we  said,  at  the  beginning  of  Chapter  19.  that  we  would  define  the 
complement  of  a  language  of  Turing  machine  descriptions  with  respect  to  the  uni¬ 
verse  of  syntactically  valid  Turing  machine  descriptions.  So  the  complement  of 
H-any  »s  HAny  —  {<M>  :  there  exists  at  least  one  string  on  which  Turing  ma¬ 
chine  M  halts}.  From  Theorem  21.2,  we  know: 

*  “’H^any  (namely.  Hany)  is  *n  SD. 

•  -'H-.any  (namely,  HANY)  is  not  in  D. 

So  H_any  *s  not  *n  SD  because,  if  it  were,  then  HAsy  would  be  in  D  but  it  isn't. 


21.6.2  Reduction 

The  most  general  technique  that  we  can  use  for  showing  that  a  language  is  not  in  SD  is 
reduction.  Our  argument  will  be  analogous  to  the  one  we  used  to  show  that  a  language 
is  not  in  D.  It  is: 

•  To  prove  that  a  language  L2  is  not  in  SD,  find  a  reduction  R  from  some  language 
that  is  already  known  not  to  be  in  SD  to  L2. 
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If  L2  were  in  SD,  then  there  would  exist  some  Turing  machine  Oracle  that  semide- 
cides  it.  Then  the  composition  of  R  and  Oracle  would  semidecide  L,.  But  there  can 
exist  no  Turing  machine  that  semidecides  Lt.  So  Oracle  cannot  exist.  So  L2  is  not  in  SD. 

There  are  two  differences  between  reductions  that  show  that  a  language  is  not  in  SD 
and  those  that  show  that  a  language  not  in  D: 

1.  We  must  choose  to  reduce  from  a  language  that  is  already  known  not  to  be  in  SD 
(as  opposed  to  choosing  one  where  all  we  can  prove  is  that  it  is  not  in  D).  We  al¬ 
ready  have  one  example  of  such  a  language:  -.H.  So  we  have  a  place  to  start. 

2.  We  hypothesize  the  existence  of  a  sew/deciding  machine  Oracle,  rather  than  a  de¬ 
ciding  one. 

The  second  of  these  will  sometimes  turn  out  to  be  critical.  In  particular,  the  function 
(which  inverts  the  output  of  Oracle). can  no  longer  be  implemented  as  aTuring  machine. 
Since  Oracle  is  claimed  only  to  be  a  semidcciding  machine,  there  is  no  guarantee  that  it 
halts.  Since  Oracle  may  loop,  there  is  no  way  to  write  a  procedure  that  accepts  iff  Oracle 
doesn’t.  So  we  won't  be  able  to  include  -i  in  any  of  our  reductions. 

We  will  need  to  find  a  way  around  this  problem  when  it  arises.  But  let’s  first  do  a  very 
simple  example  where  there  is  no  need  to  do  the  inversion.  We  begin  with  a  reduction 
proof  of  Theorem  21.15.  We  show  that  ->H  H_,ANY.  Let  R  be  a  mapping  reduction 
from  -iH  to  H_any  defined  as  follows. 

R{<M.  w>)  = 

1.  Construct  the  description  <MU>  of  a  new  Turing  machine  MU(x)  that,  on 

input  x, operates  as  follows: 

1.1.  Erase  the  tape. 

1.2.  Write  w  on  the  tape. 

1.3.  Run  M  on  w. 

2.  Return  <M#>. 

If  Oracle  exists  and  semidecides  H^ANY,  then  C  =  Oracle(R(<M ,w>))  semide¬ 
cides  -iH.  R  can  be  implemented  as  a  Turing  machine.  And  C  is  correct:  MU  ignores  its 
input.  It  halts  on  everything  or  nothing,  depending  on  whether  M  halts  on  w.  So: 

•  If  <M,  w>  e  -iH:  M  does  not  hall  on  w.  so  MU  halts  on  nothing.  Oracle(<MU>) 
accepts. 

•  If  <M.  w>  e  iH:  M  halls  on  w,  so  MU  halls  on  everything.  Oracle{<MU>)  does 
not  accept. 

But  no  machine  to  semidecide  -.H  can  exist,  so  neither  does  Oracle. 

Straightforward  reductions  of  this  sort  can  be  used  to  show  that  many  other  lan¬ 
guages,  particularly  those  that  are  defined  by  the  failure  of  a  Turing  machine  to  halt, 
are  also  not  in  SD. 


THEOREM  21.16  Doss  M  Fail  to  Halt  On  c?"  is  Not  Semidecidable 

Theorem:  -^H,  =  {<M>:  Turing  machine  M  does  not  halt  on  e)  is  not  in  SD. 
Proof,  The  prool  is  by  reduction  from  -iH.  We  leave  it  as  an  exercise. 
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Sometimes,  however,  finding  a  reduction  that  works  is  a  bit  more  difficult.  We  next 
consider  the  language: 

•  Aunbn  =  { <M>  :  M  is  a  Turing  machine  and  L(M)  =  A"Bn  =  {a"b":«  ^  0}}. 

Note  that  A;mhn  contains  strings  that  look  like: 

(q00,a00,q01,a00,  -*),  Cq00,a01,q00,al0,  -♦)  ,  CqOO.alO.qOl.aOl,  •-), 
CqOO.all.qOl.alO,  •-) ,  (q01,a00,q00,a01,  -») , 

(qOl ,  aOl , qOl , alO ,  ,  (qOl ,  alO ,  qOl ,  all ,  «-  ) ,  (qOl ,  all ,  qll ,  aOl , «-  ) 

It  does  not  contain  strings  like  aaabbb.  But  AnBn  does. 

We  are  going  to  have  to  try  a  couple  of  times  to  find  a  correct  reduction  that  can  be 
used  to  prove  that  Aanhn  is  not  in  SD. 

THEOREM  21.17  "Is  L(M)  =  AnBn?"  is  Not  Semidecidable 

Theorem:  The  language  A.  inhn  =  {<M>  :  Mis  a' Turing  machine  and  L{M)  -  A"Bn} 
is  not  in  SD. 

Proof:  We  show  that  ->H  Aan|1n  and  so  Aiin|,n  is  not  in  SD.  We  will  build  a  map¬ 
ping  reduction  R  from  -.H  to  Aa„h„.  R  needs  to  construct  the  description  of  a  new 
Turing  machine  MU  so  that  MU  is  an  acceptor  for  A"Bn  if  M  does  not  halt  on  w 
and  it  is  something  else  if  M  does  hall  on  w.  We  can  try  the  simple  MU  that  first 
runs  M  on  w  as  a  gate  that  controls  access  to  the  rest  of  the  program: 

Reduction  Attempt  1:  Define: 

R(<M.w>)  = 

1.  Construct  the  description  <MU>  of  a  new  Turing  machine  A/#(.r)  that. 

on  input  .v.  operates  as  follows: 

1.1.  Copy  the  input  x  to  a  second  tape. 

1.2.  Erase  the  tape. 

1.3.  Write  w  on  the  tape. 

1.4.  Run  M  on  w. 

1.5.  Pul  x  back  on  the  first  tape. 

1.6.  If  x  e  AnBn  then  accept;  else  loop. 

2.  Return  <MU>. 

Now  we  must  show  that,  if  some  Turing  machine  Oracle  scmidecides  Aanbn,  then 
C  =  Oracle ( R(  <  M,  w>))  scmidecides  -iH.  But  we  encounter  a  problem  when  we 
try  to  show  that  C  is  correct.  If  M  halls  on  w .  then  MU  makes  it  to  step  1.5  and  be¬ 
comes  an  AnBn  acceptor. so  Oracle! <MU>)  accepts.  If  M  docs  not  halt  on  w ,  then 
MU  accepts  nothing.  It  is  therefore  not  an  A"Bn  acceptor,  so  Oracle!  <MU>)  does 
not  accept. The  reduction  R  has  succeeded  in  capturing  the  correct  distinction:  Oracle 
returns  one  answer  when  <M,  w>  e  -<H  and  another  answer  when  <  A/,  w>  g  -,H. 
But  the  answer  is  backwards.  And  this  lime  we  can't  simply  add  the  function  to  the 
reduction  and  define  C  to  return  - >Oracle!l<!<M ,  *<•>)).  Oracle  is  only  hypothe¬ 
sized  to  be  a  scmidcciding  machine  so  there  is  no  way  for  -i  to  accept  if  Oracle  fails  to 
accept  (since  it  may  loop). 
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There  is  an  easy  way  lo  fix  this.  We  build  M#  so  that  it  either  accepts  just  AnBn 
(if  M  does  not  halt  on  u  )  or  everything  (if  M  does  halt  on  w).  We  make  that  hap¬ 
pen  by  putting  the  gate  after  the  code  that  accepts  AnBn  instead  of  before. 

Reduction  Attempt  2:  Define: 

R(<M,w>)  = 

1.  Construct  the  description  <M#>  of  a  new  Turing  machine  M#(.v)  that, 
on  input  a-,  operates  as  follows: 

1.2.  If  .v  e  AnBn  then  accept.  Else: 

1.2.  Erase  the  tape. 

1.3.  Write  w  on  the  tape. 

1.4.  Run  M  on  w. 

1.5.  Accept. 

2.  Return  <Af#>. 

If  Oracle  exists  and  semidecides  Aanbn,  then  C  =  Oracle(R{<M,  w>))  semi- 
decides  -.H.  R  can  be  implemented  as  a  Turing  machine.  And  C  is  correct:  M#  im¬ 
mediately  accepts  all  strings  in  AnBn.  If  M  does  not  halt  on  w.  those  are  the  only 
strings  that  M#  accepts.  If  M  does  halt  on  w\  M#  accepts  everything.  So: 

•  If  <M.  w>  e  -il  l:  M  does  not  halt  on  w ,  so  M#  accepts  AnBn  in  step  1.1. Then 
it  gels  stuck  in  step  1.4,  so  it  accepts  nothing  else.  It  is  an  A"Bn  acceptor. 
Oracle(<M#>)  accepts. 

•  If  <M.  w>  g  -iH:  M  halts  on  w,  so  A/#  accepts  everything.  L(A/#)  # 
A''B".  Oracle(<M#>)  does  not  accept. 

But  no  machine  to  semidecide  -iH  can  exist,  so  neither  does  Oracle. 

Sometimes,  however,  the  simple  gate  technique  doesn't  work,  as  we  will  see  in  the 
next  example. 


THEOREM  21.18  "DoesTWHalt  On  Everything?"  is  Not  Semideeidable 

Theorem:  HALL  =  [<M>:  Turing  machine  M  halts  on  2*}  is  not  in  SD. 

Proof:  We  show  that  HALL  is  not  in  SD  by  reduction  from  -.H. 

Reduction  Attempt  1:  Define: 

R{<M ,  w>)  = 

1.  Construct  the  description  <M#>  of  a  new  Turing  machine  M#(x)  that, 
on  input  .v,  operates  as  follows: 

1.1.  Erase  the  tape. 

1.2.  Write  w  on  the  tape. 

1.3.  Run  M  on  w. 

2.  Return  <JW#>. 
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We  can  attempt  to  show  that,  if  Oracle  exists  and  scmidecides  Hai  L.  then  C  - 
Oracle{R{<M.  w>))  correctly  semidecides  -iH.Thc  problem  is  that  it  doesn't: 

•  If  <M.  w>  e  -iH:  M  does  not  halt  on  w , so  MU  gels  stuck  in  step  1.3  and  halts 
on  nothing. Oracle(<MU>)  does  not  accept. 

•  If  < M.  w>  *  ->H:  M  halts  on  u\  so  M U  halls  on  everything.  Oracle(<MU>) 
accepts. 

This  is  backwards.  We  could  try  hailing  on  something  before  running  M  on  u\ 
ihe  way  we  did  on  Ihe  previous  example.  But  the  only  way  to  make  MU  into  a  ma¬ 
chine  that  halts  on  everything  would  be  to  have  it  halt  immediately,  before  run¬ 
ning  M.  But  then  its  behavior  would  not  depend  on  whether  M  halls  on  w,  We 
need  a  new  technique. 

Reduction  Attempt  2:  Define: 

R(<M.  i<»)  = 

1.  Construct  the  description  <MU>  of  a  new  Turing  machine  MU(x)  that, 

on  input  .r.  operates  as  follows: 

1.1.  Copy  the  input  ,v  to  a  second  tape. 

1.2.  Erase  the  tape. 

13.  Write  w  on  the  tape. 

1.4.  Run  M  on  w  for  |x|  steps  or  until  it  naturally  halls. 

13.  if  M  naturally  halted,  then  loop. 

1.6.  Else  halt. 

2.  Return  <MU>. 

We  build  MU  so  that  it  runs  the  simulation  of  M  on  w  for  some  finite  number  of 
steps  and  observes  whether  M  would  have  halted  in  that  time.  II  M  would  have 
halted.  MU  loops.  If  M  would  not  have  halted.  MU  promptly  halls.  This  is  where  we 
flip  from  halting  to  looping  and  vice  versa.  It  works  because  the  simulation  always 
halts,  so  MU  never  gets  stuck  running  it.  But  for  how  many  steps  should  we  run  the 
simulation?  If  M  is  going  to  halt,  we  don’t  know  how  long  it  will  take  for  it  to  do  so. 
We  need  to  guarantee  that  we  don't  quit  too  soon  and  think  that  M  isn't  going  to 
halt  when  it  actually  will.  Here's  the  insight: The  language  ii*  is  infinite.  So  if  MU  is 
going  to  halt  on  every  string  in  2*.  it  will  have  to  hall  on  an  infinite  number  of 
strings.  It’s  okay  if  MU  gets  fooled  into  thinking  that  M  will  hall  some  of  the  time  as 
long  as  it  does  not  do  so  for  all  possible  inputs.  So  MU  will  run  the  simulation  of  M 
on  w  for  a  number  of  steps  equal  to  the  length  of  its  (MU's)  own  input.  It  mav  be 
fooled  into  thinking  M  is  going  to  hall  on  ir  when  it  is  invoked  on  short  strings.  But, 
if  M  does  eventually  halt,  it  does  so  in  some  number  of  steps  n.  When  started  on  any 
strings  of  length  n  or  more,  MU  will  try  long  enough  and  will  discover  that  M  on  w 
would  have  halted. Then  it  will  loop.  So  it  will  not  halt  on  all  strings  in  S)*. 

If  Oracle  exists  and  semidecides  Hall.  then  C  =  Oracle{R(<  M.  <«>))  semi- 
deeidcs  -.H.  R  can  be  implemented  as  a  Turing  machine.  And  C  is  correct: 

•  If  <M,  w>  e  -iH:  M  does  not  halt  on  w\  So.  no  matter  how  long  x  is.  M  will 
not  hall  in  |.v|  steps.  So.  for  all  inputs  x.  MU  makes  it  to  step  1.6.  So  it  halts  on 
everything.  Oracle(<MU>)  accepts. 
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•  lf<M.  U’>  g  ->H:  M  halts  on  w.  It  does  so  in  some  numher  of  steps  n.  On  in¬ 
puts  of  length  less  than  n.  M#  will  make  it  to  step  1.6  and  halt.  But  on  all  inputs 
of  length  n  or  greater.  A/#  will  loop  in  step  1.5.  So  it  fails  to  hall  on  everything. 
Oraclc(<Mti>)  does  not  accept. 

But  no  machine  to  somidecide  -iH  can  exist,  so  neither  does  Orncle. 

21 .6.3  Is  L  in  D,  SD/D,  or  -nSD? 

Throughout  this  chapter,  we  have  seen  examples  of  languages  that  are  decidable  (i.e., 
they  are  in  D).  are  semidecidable  but  not  decidable  (i.e.,  they  are  in  SD/D),  and.  most 
recently,  are  not  even  semidecidable  (i.e.,  they  are  in  iSDl.We  have  seen  some  heuris¬ 
tics  that  are  useful  for  analyzing  a  language  and  determining  what  class  it  is  in.  In  ap¬ 
plying  those  heuristics,  it  is  critical  that  we  look  closely  at  the  language  definition. 
Small  changes  can  make  a  big  difference  to  the  decidability  of  the  language.  For  exam¬ 
ple,  consider  the  following  four  languages  (where,  in  each  case,  M  is  a  Turing  machine): 

1.  {<M>:  M  has  an  even  number  of  slates}. 

2.  { <M>  :  |< M>\  is  even}. 

3.  \<M>  :  \L(M )|  is  even}  (i.e.,  L(M)  contains  an  even  number  of  strings). 

4.  { <M>  :  M  accepts  all  even  length  strings}. 

Language  1  is  in  D.  A  simple  examination  of  <M>  will  tell  us  how  many  states  M  has. 
Language  2  is  also  in  D.To  decide  it.  all  we  need  to  do  is  to  examine  <M> ,  the  string  de¬ 
scription  of  M,  and  determine  whether  that  string  is  of  even  length.  Rice's  Theorem  does 
not  apply  in  either  of  those  cases  since  the  property  we  care  about  involves  the  physical 
Turing  machine  M ,  not  the  language  it  accepts.  But  languages  3  and  4  are  different.  To  de¬ 
cide  either  of  them  requires  evaluating  a  property  of  the  language  that  a  TUring  machine 
M  accepts.  Rice's  Theorem  tells  us.  therefore,  that  neither  of  them  is  in  D.  In  fact,  neither 
of  them  is  in  SD  either.The  intuition  here  is  that  to  semidecide  them  by  simulation  would 
require  trying  an  inlinite  number  of  strings.  We  leave  the  proof  of  this  claim  as  an  exercise. 

Now  consider  another  set: 

1.  {<A*.  «’>  :  Turing  machine  M  does  not  halt  on  input  string  w}.  (This  is  just  -.H.) 

2.  { <Af .  tt’>  :  Turing  machine  M  rejects  v'}. 

3.  { <M,  w>  :  Turing  machine  M  is  a  deciding  Turing  machine  and  M  rejects  m;}. 

We  know  that  -.H  is  in  ->SD.  What  about  language  2?  It  seems  similar.  But  it  is  differ¬ 
ent  in  a  crucial  way  and  is  therefore  in  SD/D.  The  following  Turing  machine  semide- 
cidcs  it:  Run  M  on  to;  if  it  halts  and  rejects,  accept.  The  key  difference  is  that  now, 
instead  of  needing  to  detect  that  M  loops,  we  need  to  detect  that  M  halls  and  rejects. 
We  can  detect  halting  (but  not  looping)  by  simulation.  Now  consider  language  3.  If  it 
were  somehow  possible  to  know  that  M  were  a  deciding  machine,  then  there  would  be 
a  decision  procedure  to  determine  whether  or  not  it  rejects  w:  Run  M  on  w.  It  must  halt 
(since  its  a  deciding  machine).  If  it  rejects,  accept,  else  reject. That  would  mean  lan¬ 
guage  3  would  he  in  D.  But  language  3  is,  in  fact. in  -iSD.  It  is  harder  than  language  2. The 
problem  is  that  there  is  not  even  a  semideciding  procedure  for  the  question,“Is  M  a  de¬ 
ciding  machine?  That  question  is  equivalent  to  asking  whether  M  halts  on  all  inputs, 
which  we  have  shown  is  not  semidecidable. 
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21.7  Summary  of  Df  SD/D  and  -SD  Languages  that  Include 
Turing  Machine  Descriptions 

At  the  beginning  of  this  chapter,  we  presented  a  table  with  a  set  of  questions  that  we  might 
like  to  ask  about  Tbring  machines  and  we  showed  the  language  formulation  of  each  ques¬ 
tion.  We  have  now  proven  where  most  of  those  languages  fall  in  the  D.  SD/D,  -*SD  hierar¬ 
chy  that  we  have  defined.  (The  rest  are  given  as  exercises.)  So  we  know  whether  there 
exists  a  decision  procedure,  a  semidecision  procedure,  or  neither,  to  answer  the  correspon¬ 
ding  question.  Because  many  of  these  questions  are  very  important  as  we  try  to  understand 
the  power  of  the  Turing  machine  formalism,  we  summarize  in  Table  21.2  the  status  of  those 
initial  questions,  along  with  some  of  the  others  that  we  have  considered  in  this  chapter. 


Table  21.2  The  problem  and  the  language  view. 

The  Problem  View 

The  Language  View 

Status 

Does  TM  M  have  an  even  number 

{ <M>  :TM  M  has  an  even 

D 

of  states? 

number  of  states) 

Does  TM  M  halt  on  «>? 

H  =  {<M,  u>>  :TM  M  halls  on  w) 

Esai 

DoesTM  M  halt  on  the  empty 
tape? 

H„  =  [<M>  :TM  M  halts  on  e) 

a 

Is  there  any  siring  on  which  TM  M 

Hany  =  {<  M  >:  there  exists  at 

SD/D 

halts? 

least  one  string  on  which  TM  M  halts  ) 

DoesTM  M  halt  on  all  strings? 

Hall  ~  {<Af>  :TM  M  halts  on 

2*} 

->SD 

DoesTM  M  accept  w? 

A  =  {<M.  w>  :TM  M  accepts  ir) 

SD/D 

Does  TM  M  accept  e? 

Ae  =  (<M>  :TM  M  accepts  e) 

SD/D 

Is  there  any  string  that  TM  M 

Aany  { <M>  :  there  exists  at  least 

SD/D 

accepts? 

one  string  that  TM  M  accepts  } 

Does  TM  M  fail  to  halt  on  u>? 

->H  -  { <M .  u»  :  TM  M  does  not 
halt  on  w} 

->SD 

DoesTM  M  accept  all  strings? 

AAll  =  {<M>  :  L(M)  «  2*} 

->SD 

DoTMs  Ma  and  Afh,  accept  the  same 

EqTMs  =  {<M„  A/h>  :  L(Mt) 

->SD 

languages? 

=  L(Mb)) 

Is  it  the  case  that  TM  M  does  not 

H^any  =  {  < M>  :  there  docs  not 

->SD 

halt  on  any  string? 

exist  any  string  on  which  TM  M 
halts) 

DoesTM  M  fail  to  halt  on  its  own 

{<M>  :TM  M does  not  halt  on 

->SD 

description? 

input  <Af>) 

IsTM  A#  minimal? 

TMmin  =  {<M>  :TM  M  is 
minimal) 

->SD 

Is  the  language  that  TM  M  accepts 

TMreg  *  {<M>  :  L(M)  is 

“■SD 

regular? 

regular) 

Is  L(M)  =  A"B"? 

Aanbn  =  {<M>  L(M)  =  A"Bn} 

->SD 
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Exercises 

1.  For  each  of  the  following  languages  L,  state  whether  L  is  in  D,  SD/D,  or  -iSD. 
Prove  your  claim.  Assume  that  any  input  of  the  form  <M  >  is  a  description  of  a 
Turing  machine. 

a.  {a} 

b.  {<M>  :  a  eL(W)} 

c.  {<M>  :  L(M)  =  {a}} 

d.  Mh>  :  M„  and  Mb  are  Hiring  machines  and  s  e  L(Ma)  -  L(Mb)} 

e.  {<Ma,  Mb>  :  Ma  and  Mb  are  Hiring  machines  and  L(Ma)  =  L(Mb)  -  {e}} 

f.  {<Md,  Mb>  :  M„  and  Mb  are  Hiring  machines  and  L(Ma)  *  L(Mb)} 

g.  { <  M,  w>  :  M,  when  operating  on  input  w,  never  moves  to  the  right  on  two  con¬ 
secutive  moves} 

h.  {<M>  :  M  is  the  only  Turing  machine  that  accepts  L(Af)} 

i.  {<M >  :  L(M)  contains  at  least  two  strings} 

j .  {<M>  :  M  rejects  at  least  two  even  length  strings} 

k.  {<M>:  M  halts  on  all  palindromes} 

L  {<M >  :  L(M)  is  context-free} 

m.  {<Af>  :  L(M)  is  not  context-free} 

n.  {<M>  :  At(L(M))  >  0},  where  A#(L)  =  \LCl  {a*}| 

o.  {<M>  :  \L(M)\  is  a  prime  integer  >  0} 

p.  {<M>  :  there  exists  a  string  w  such  that  |mj|  <  |<Af>|  and  that  M  accepts  u>} 

q.  {<M>  :  M  does  not  accept  any  string  that  ends  with  0} 

r.  {<M>  :  there  are  at  least  two  strings  w  and  x  such  that  M  halts  on  both  w 
and  x  within  some  number  of  steps  s,  and  s  <  1000  and  x  is  prime} 

s.  {<M>  :  there  exists  an  input  on  which  M  halts  in  fewer  than  |<Af>|  steps} 
t  {<M>  :  L(M)  is  infinite} 

u.  {  < Af  >  :  L(M)  is  uncountably  infinite} 

v.  {<M>  :  M  accepts  the  string  <M,  M>  and  does  not  accept  the  string  <Af>} 

w.  {<M>:  M  accepts  at  least  two  strings  of  different  lengths} 

x.  {<M>  :  M  accepts  exactly  two  strings  and  they  are  of  different  lengths} 

y.  {<Af,  w>  :  M  accepts  w  and  rejects  wH) 

t .  {<M,  x,  y>  :  M  accepts  xy } 

aa.  {<D>:  <D>  is  the  string  encoding  of  a  deterministic  FSM  D  and  L(D)  -  0} 

2.  In  E.3,  we  describe  a  straightforward  use  of  reduction  that  solves  a  grid  coloring 
problem  by  reducing  it  to  a  graph  problem.  Given  the  grid  G  shown  here: 

a.  Show  the  graph  that  corresponds  to  G. 

b.  Use  the  graph  algorithm  we  describe  to  find  a  coloring  of  G. 
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3.  In  Ihis  problem,  we  consider  the  relationship  between  H  and  a  very  simple  lan¬ 
guage  {a}. 

a.  Show  that  {a}  is  mapping  reducible  to  H. 

b.  Is  it  possible  to  reduce  H  to  {a}?  Prove  your  answer. 

4.  Show  that  HA|  L  =  { <M>  :  Turing  machine  M  halts  on  j  js  noj  jn  j)  foy  re. 
duction  from  H. 

5.  Show  that  each  of  the  following  languages  is  not  in  D. 
a.  Ar 

b*  AANy 
c*  aall 

d.  \<M.  ic>  : Turing  machine  M  rejects  «’} 

e.  { <M ,  w>  :  Turing  machine  M  is  a  deciding  Turing  machine  and  M  rejects  «?} 

6.  Show  that  L  =  {< M  >:  Turing  machine  M.  on  input  r.  ever  writes  0  on  its  tape} 
is  in  D  iff  H  is  in  D.  In  other  words,  show  that  L  :£  H  and  HsL 

7.  Show  that  each  of  the  following  questions  is  undecidablc  by  recasting  it  as  a  language 
recognition  problem  and  showing  that  the  corresponding  language  is  not  in  D. 

a.  Given  a  program  P,  input  x,  and  a  variable  n,  docs  /\  when  running  on  a,  ever 
assign  a  value  to  »? 


Can  a  compiler  check  to  make  sure  every  variable  is  initialized  before  it  is 
used?  (G.4.4) 


b.  Given  a  program  P  and  code  segment  S  in  P.  does  P  reach  S  on  every  input  (in 
other  words,  can  we  guarantee  that  S  happens)? 

c.  Given  a  program  P  and  a  variable  a,  is  a  always  initialized  before  it  is  used? 

d.  Given  a  program  P  and  a  file  /.does  P  always  close /before  it  exits? 

e.  Given  a  program  P  with  an  array  reference  of  the  form  «|/|.  will  /.  at  the  time 
of  the  reference,  always  be  within  the  bounds  declared  for  the  array? 

f.  Given  a  program  P  and  a  database  of  objects  (I.  does  P  perform  the  function  / 
on  all  elements  of  iP. 
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8.  Theorem  J.l  tells  us  that  the  safety  of  even  a  very  simple  security  model  is  undecid- 
able.  Its  proof  is  by  reduction  from  H„.  Show  an  alternative  proof  that  reduces 
A  =  {<M.  w>  :  M  is  a  Turing  machine  and  w  e  L(M)}  to  the  language  Safety. 

9.  Show  that  each  of  the  following  languages  is  not  in  SD. 

a.  -iHe 

b.  EqTMs 

c.  TMRE(j 

d.  \  <M>  \  \L(M)\  is  even} 

e.  { <M >  ; Turing  machine  M  accepts  all  even  length  strings} 

f.  { <  M >  :  Turing  machine  M  accepts  no  even  length  strings} 

g.  \<M>  :  Turing  machine  M  does  not  halt  on  input  <M>} 

h.  { <M,  w>  :  M  is  a  deciding  Turing  machine  and  M  rejects  ir} 

tl>.  Do  the  other  half  of  the  proof  of  Rice’s  Theorem.  In  other  words,  show  that  the 
theorem  holds  if  P{0)  =  True. 

11.  For  each  of  the  following  languages  L,do  two  things: 

i.  Stale  whether  or  not  Rice’s  Theorem  has  anything  to  tell  us  about  the  decidability 
of  L. 

ii.  Stale  whether  /.  is  in  D,  SD/D,  or  not  in  SD. 

a.  { <M>  :  Turing  machine  M  accepts  all  strings  that  start  with  a}. 

b.  {<M>  :  Turing  machine  M  halts  on  e  in  no  more  than  1000  steps}. 

c.  -iL|,  where  Lj  =  {<Af>  :  Tbring  machine  M  halls  on  all  strings  in  no  more 
than  1000  steps}. 

d.  { <M .  iv>  '.Turing  machine  M  rejects  w). 

12.  Use  Rice’s  Theorem  to  prove  that  each  of  the  following  languages  L  is  not  in  D: 

a.  \  <M>  :  Turing  machine  M  accepts  at  least  two  odd  length  strings} . 

b.  {<M>  :  M  is  aTuring  machine  and  |/_( Af )|  =  12}. 

13.  Prove  that  there  exists  no  mapping  reduction  from  H  to  the  language  L 2  that  we 
defined  in  Theorem  21.9. 

14.  Let  i  =  {1}.  Show  that  there  exists  at  least  one  undecidable  language  with  al¬ 
phabet  2. 

15.  Give  an  example  of  a  language  L  such  that  neither  L  nor  -,L  is  decidable. 

16.  Let  repl  be  a  function  that  maps  from  one  language  to  another.  It  is  defined  as 
follows: 

rept(L)  =  (u; :  3.v  e  L  and  w  =  .va  }. 

a.  Are  the  context  free  languages  closed  under  repll  Prove  your  answer. 

b.  Are  the  decidable  languages  closed  under  repll  Prove  your  answer. 

17.  For  any  nonempty  alphabet  2,  let  L  be  any  decidable  language  other  than  0  or 
2*.  Prove  that  L  sM  -1 L. 

18.  We  will  say  that  L,  is  doubly  reducible  to  L2 ,  which  we  will  write  as  L,  L2,  iff 
there  exist  two  computable  functions  /,  and  f2  such  that; 

V*  e  £*((*  e  L,)  iff  (/,(.v)  e  Lz  and  f2(x)  $  L2)). 
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Prove  or  disprove  each  of  the  following  claims: 

a.  If  L|  s  L2  and  L2  *  then  L(  L->. 

b.  If  Lj  <D  Li  and  L2  e  D.  then  L|  e  D. 

c.  For  every  language  L2.  there  exists  a  language  Lx  such  that  -«(L|  L2). 

19.  Let  L|  and  L2  he  any  two  SD/D  languages  such  that  L(  C  L2.  Is  it  possible  that 
L|  is  reducible  to  L2?  Prove  your  answer. 

20.  If  Lt  and  L2  are  decidable  languages  and  L|CLCL2,  must  L  be  decidable? 
Prove  your  answer. 

21.  Goldbach's  conjecture  states  that  every  even  integer  greater  than  2  can  be  written  as 

the  sum  of  two  primes.  (Consider  I  to  be  prime.)  Suppose  that  A  =  [<M,  w>  '  M 
is  a  Turing  machine  and  were  decidable  by  some  Turing  machine 

Oracle.  Define  the  following  function: 

G()  =  True  if  Goldbach’s  conjecture  is  true, 

F alse  otherwise. 

Use  Oracle  to  describe  a  Turing  machine  that  computes  C.  You  may  assume  the 
existence  of  a  Turing  machine  P  that  decides  whether  a  number  is  prime. 

22.  A  language  L  is  D-complele  iff  (1 )  L  is  in  D,  and  (2)  for  every  language  L'  in 
D,  L'  —  L.  Consider  the  following  claim:  If  L  e  D  and  L  *  and  L  #  0, 
then  L  is  D-compIcte.  Prove  or  disprove  this  claim. 


CHAPTER  22 

Decidability  of  Languages  That 
Do  Not  (Obviously)  Ask  Questions 
about  Turing  Machines* 


If  ihc  only  problems  that  were  undccidable  were  questions  involving  the  behavior 
of  Turing  Machines  (and  thus  programs  written  in  any  reasonable  formalism),  we 
would  still  care.  After  all.  being  able  to  prove  properties  of  the  programs  that  we 
write  is  of  critical  importance  when  bugs  could  mean  the  loss  of  millions  of  dollars  or 
hundreds  of  lives.  But  Turing  Machines  do  not  own  the  market  in  undccidable  prob¬ 
lems.  In  this  chapter,  we  will  look  at  some  examples  of  undecidable  problems  that  do 
not  (at  least  directly)  ask  questions  about  Turing  Machines. 

Although  the  problems  we  will  consider  here  do  not  appear  to  involve  Turing  Ma¬ 
chines,  each  of  the  undecidability  proofs  that  we  will  describe  is  based,  either  directly 
or  indirectly,  on  a  reduction  from  a  language  (such  as  H,  A,  or  -<Ht.)  whose  definition 
does  refer  to  the  behavior  of  Turing  machines.  Many  of  these  proofs  are  based  on  vari¬ 
ants  of  a  single  idea,  namely  the  fact  that  it  is  possible  to  encode  a  Turing  machine  con¬ 
figuration  as  a  string.  For  example,  the  string  abqlOOcd  can  encode  the  configuration 
of  a  TUring  machine  that  is  in  stale  4,  with  abed  on  the  tape  and  the  read/write  head  po¬ 
sitioned  on  the  c.  Then  we  can  encode  a  computation  of  a  Turing  machine  as  a  se¬ 
quence  of  configurations,  separated  by  delimiters.  For  example,  we  might  have: 

#qOJabcd#qlabcd#aqlbcd# 

Or  we  can  encode  a  compulation  as  a  table,  where  each  row  corresponds  to  a  config¬ 
uration.  So.  for  example,  we  might  have: 

#qOJabcd# 

#Jqlabcd# 

#  Jaqlbcd# 


AQ-f 
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To  show  that  a  new  language  is  not  decidable,  we  will  then  define  a  reduction  that 
maps  from  one  of  these  representations  of  a  Turing  machine's  compulations  to  some  es¬ 
sential  structure  of  the  new  problem.  We  will  design  the  reduction  so  that  the  new  prob¬ 
lem's  structure  possesses  some  key  property  iff  the  Turing  machine’s  computation  enters 
a  halting  slate  (or  an  accepting  state)  or  fails  to  enter  a  halting  state,  or  whatever  it  is  we 
need  to  check.  So.  if  there  is  a  procedure  to  decide  whether  an  instance  of  the  new  prob¬ 
lem  possesses  the  key  property,  then  there  is  also  a  way  to  check  whether  a  Turing  ma¬ 
chine  halls  (or  accepts  or  whatever).  So.  for  example. suppose  that  we  are  using  the  table 
representation  and  that  M.  the  Turing  machine  whose  computation  we  have  described, 
does  not  halt.  Then  the  table  will  have  an  infinite  number  of  rows.  In  the  proof  we’ll 
sketch  in  Section  22.3.  the  cells  of  the  table  will  correspond  to  tiles  that  must  be  arranged 
according  to  a  small  set  of  rules.  So.  if  we  could  tell  whether  an  infinite  arrangement  of 
tiles  exists,  we  could  tell  whether  the  table  is  infinite  (and  thus  whether  M  fails  to  halt). 

22.1  Diophantine  Equations  and  Hilbert's  10th  Problem 

In  iy(H),  the  German  mathematician  David  Hilbert  presented  a  list  of  23  problems  that 
he  argued  should  be  the  focus  of  mathematical  research  as  the  new  century  began. The 
10lh  of  his  problems  u  concerned  systems  of  Diophantine  equations  (polynomials  in 
any  number  of  variables,  all  with  integer  coefficients),  such  as: 

4.V3  +  7 xy  +  2z2  -  23.v4z  =  0. 

A  Diophantine  problem  is.  “Given  a  system  of  Diophantine  equations,  docs  it  have 
an  integer  solution?”  Hilbert  asked  whether  there  exists  a  decision  procedure  for  Dio¬ 
phantine  problems.  Diophantine  problems  are  important  in  applications  in  which  the 
variables  correspond  to  quantities  of  indivisible  objects  in  the  world.  For  example,  sup¬ 
pose  .v  is  the  number  of  shares  of  stock  A  to  be  bought,  v  is  the  number  of  shares  of 
stock  B.  and  z  is  the  number  of  shares  of  slock  C.  Since  it  is  not  generally  possible  to 
buy  fractions  of  shares  of  stock,  any  useful  solution  to  an  equation  involving. x,y,  and  z 
would  necessarily  assign  integer  values  to  each  variable. 

We  can  recast  the  Diophantine  problem  as  the  language  TENTH  =  \<w>  :  w  is  a 
system  of  Diophantine  equations  that  has  an  integer  solution}.  In  1 1>70.  Yuri  Matiyasevich 
proved  a  general  result  from  which  it  follows  that  the  answer  to  Hilbert's  question  is  no: 
TENTH  is  not  in  D.  Using  the  Fibonacci  sequence  (defined  in  Example  24.4).  Matiyase¬ 
vich  proved  that  every  semidecidable  set  .S’  is  Diophantine.  by  which  we  mean  that  there  is 
a  reduction  from  S  to  the  problem  of  deciding  whether  some  system  of  Diophantine  equa¬ 
tions  has  an  integer  solution.  So.  if  the  Diophantine  problem  were  decidable,  every  semi¬ 
decidable  set  would  be  decidable.  But  we  know  that  there  are  semidecidable  sets  (e.g.,  the 
language  H)  that  are  not  decidable.  So  the  Diophantine  problem  is  not  decidable  either. 

As  an  aside,  however,  we  should  point  out  that  when  all  of  the  terms  are  of  degree  1, 
Diophantine  problems  are  decidable,  which  is  good  news  for  the  writers  of  puzzle 
problems  of  the  sort: 

A  farmer  buys  100  animals  for  $100.00.  The  animals  include  at  least  one 
cow,  one  pig,  and  one  chicken,  but  no  other  kind.  If  a  cow  costs  $10.00,  a  pig 
costs  $3.00,  and  a  chicken  costs  $0.50,  how  many  of  each  did  he  buy?  Q 
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It  is  also  true  that  Diophantine  problems  that  involve  just  a  single  variable  are  de¬ 
cidable.  And  quadratic  Diophantine  problems  of  two  variables  are  decidable. These  are 
problems  of  the  form  ax2  +  by  =  c,  where  a ,  b ,  and  c  are  positive  integers  and  we  ask 
whether  there  exist  integer  values  of  x  and  y  that  satisfy  the  equation. 

We  will  return  to  the  question  of  the  solvability  of  Diophantine  problems  in  Part  V. 
There  we  will  sec  that: 

•  Diophantine  problems  of  degree  1  (like  the  cows,  pigs,  and  chickens  problem)  and 
Diophantine  problems  of  a  single  variable  of  the  form  axk  =  c  are  not  only  solv¬ 
able,  they  are  efficiently  solvable  (i.e„  there  exists  a  polynomial-time  algorithm  to 
solve  them). 

•  Quadratic  Diophantine  problems  are  solvable,  but  there  appears  to  exist  no  poly¬ 
nomial-time  algorithm  for  doing  so. The  quadratic  Diophantine  problem  belongs  to 
the  complexity  class  NP-complete. 

•  The  general  Diophantine  problem  is  undecidable,  so  not  even  an  inefficient  algo¬ 
rithm  for  it  exists. 


22.2  Post  Correspondence  Problem 

Consider  two  lists  of  strings  over  some  alphabet  2.  The  lists  must  be  finite  and  of  equal 
length.  We  can  call  the  two  lists  X  and  Y.  So  we  have: 

X  =  .r2,  x3 . xn. 

X  yi,  y>.  yj. . . . ,  y„. 

Now  we  ask  a  question  about  the  lists:  Does  there  exist  some  finite  sequence  of  in¬ 
tegers  that  can  be  viewed  as  indices  of  X  and  y  such  that,  when  elements  of  X  are  se¬ 
lected  as  specified  and  concatenated  together,  we  get  the  same  string  that  we  get  when 
elements  of  Y  are  also  concatenated  together  as  specified?  For  example,  if  we  assert 
that  1, 3.4  is  such  a  sequence,  we're  asserting  that  jc,jryr4  =  y,yjy4.  Any  problem  of  this 
form  is  an  instunce  of  the  Post  Correspondence  Problem  S,  first  formulated  in  the 
1940's  by  Emil  Post. 


EXAMPLE  22.1  A  PCP  Instance  with  a  Simple  Solution 
Let  PCP(  be: 


X 

Y 

b 

aab 

_2_ 

abb 

b 

_3_ 

aba 

a 

_4_ 

baaa 

baba 
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EXAMPLE  22.1  ( Continued ) 

PCP|  has  a  very  simple  solution:  3, 4, 1.  which  is  a  solution  because  (ignoring 
spaces,  which  are  shown  here  just  to  make  it  clear  what  is  happening): 

aba  baaa  b  =  a  baba  aab 

It  also  has  an  infinite  number  of  other  solutions,  including  3.4, 1,3,4, 1  and  3,4, 
1,3. 4.1, 3, 4,1. 


EXAMPLE  22.2  A  PCP  Instance  with  No  Solution 


Let  PCP2  be: 


X 

Y 

1 

11 

Oil 

2 

01 

0 

3 

001 

110 

PCP2  has  no  solution.  It  is  stiaighlforward  to  show  this  by  trying  candidate  solu¬ 
tions.  Mismatched  symbols  that  cause  the  current  path  to  die  are  marked  with  an  *. 


start 


second  index  =  I 
*=0111 
>'  =  0011 


second  index  =  2 
*=0101 
>'=0  0 


second  index  =  3 
*=01001 
V  =  0  1  1  0 


x 


x 


All  paths  have  failed  to  find  a  solution. 


EXAMPLE  22.3  A  PCP  Instance  with  No  Simple  Solutions 
Let  PCP3  be: 


X 

Y 

1 

1101 

1 

2 

0110 

11 

3 

1 

110 

II  OH  » 
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PCP)  has  solutions  (in  fact,  an  infinite  number  of  them),  but  the  shortest  one 
has  length  252  S. 


We  can  formulate  the  Post  Correspondence  Problem  as  a  language  decision  prob¬ 
lem.  To  do  that,  we  need  to  define  a  way  to  encode  each  instance  of  the  Post  Corre¬ 
spondence  Problem  as  a  string.  Lei  S  be  any  nonempty  alphabet.  Then  an  instance  of 
the  Post  Correspondence  Problem  is  a  string  <P>  of  the  form: 

<P>  *  (*,,  A‘2,  Vi . -v„)0’i.  V2. y.v }’„)■  where  V) (.ry e  2+  and y} e  S*). 

For  example.  <PCP,>  =  (b.  abb.  aba.  baaa)(aab.  b.a.baba). 

We'll  say  that  a  PCP  instance  has  size  n  whenever  the  number  of  strings  in  its  X 
list  is  n.  (In  this  case,  the  number  of  strings  in  its  Y  list  is  also  n.)  A  solution  to  a  PCP 
instance  of  size  n  is  a  finite  sequence  /,.  r2, . . .  /*  of  integers  such  that: 

VI  j  s  k  (1  ss  ij  s  n  and.vJt.v/J...jfii  =  yily,i...yil). 

To  define  a  concrete  PCP  language,  we  need  to  fix  an  alphabet.  We’ll  let  2  be  {0.1} 
and  simply  encode  any  other  alphabet  in  binary.  Now  the  problem  of  determining 
whether  a  particular  instance  P  of  the  Post  Correspondence  Problem  has  a  solution 
can  be  recast  as  the  problem  of  deciding  the  language: 

•  PCP  =  { <P>  :  P  is  an  instance  of  the  Post  correspondence  problem  and  P  has  a 
solution}. 


THEOREM  22.1  The  Undecidability  of  PCP 

Theorem:  The  language  PCP  =  {<P> :  P  is  an  instance  of  the  Post  correspon¬ 
dence  problem  and  P  has  a  solution}  is  in  SD/D. 

Proof:  We  will  first  show  that  PCP  is  in  SO  by  building  a  Turing  machine 
Mvn.(<P>)  that  semidecides  it.  The  idea  is  that  Mn  P  will  simply  try  all  possible 
solutions  of  length  I.  then  all  possible  solutions  of  length  2,  and  so  forth.  If  there 
is  any  finite  sequence  of  indices  that  is  a  solution.  AfKT  will  find  it.  To  describe 
more  clearly  how  MKV  works,  we  first  observe  that  any  solution  to  a  PCP 

problem  P  —  (V|,  .v2.  .v* . •v»i)(yi,v2,  y^ . yn)  is  a  finite  sequence  of  integers 

between  1  and  n.  We  can  build  a  Turing  machine  Af#  that  lexicographically 
enumerates  all  such  sequences.  Now  we  define: 

Mpti»(<P>)  = 

1.  Invoke  MU. 

2.  As  each  string  is  enumerated,  see  if  it  is  a  solution  to  P.  If  so.  halt. 

Next  we  must  prove  that  PCP  is  not  in  D.  There  are  two  approaches  we  could 
lake  to  doing  this  proof.  One  is  to  use  reduction  from  H.The  idea  is  that,  to  decide 
whether  <M ,  ic>  is  in  H,  we  create  a  PCP  instance  that  simulates  the  computation 
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history  of  M  running  on  ir.Wc  do  the  construction  in  such  a  way  that  there  exists  a 
finite  sequence  that  solves  the  PCP  problem  iff  the  computation  halts.  So.  if  we 
could  decide  PCP,  we  could  decide  H.  An  alternative  is  to  make  use  of  the  grammar 
formalism  that  we  will  define  in  the  next  chapter.  We'll  show  there  that  unrestrict¬ 
ed  grammars  generate  all  and  only  the  SD  languages.  So  the  problem  of  deciding 
whether  a  grammar  G  generates  a  string  iv  is  equivalent  to  deciding  whether  a  Tur¬ 
ing  machine  M  accepts  u'  and  is  thus  undecidable.  Given  that  result,  we  can  prove 
that  PCP  is  not  in  D  by  a  construction  that  maps  a  grammar  to  a  PCP  instance  in 
such  a  way  that  there  exists  a  finite  sequence  that  solves  the  PCP  problem  iff  there 
is  a  finite  derivation  of  w  using  the  rules  of  G.This  second  approach  is  somewhat 
easier  to  explain,  so  it  is  the  one  we'll  use.  We  give  the  proof  in  E.4. 


It  turns  out  that  some  special  cases  of  PCP  are  decidable.  For  example,  if  we  restrict 
our  attention  to  problems  of  size  2.  then  PCP  is  decidable.  A  bounded  version  of  PCP 
is  also  decidable.  Define  the  language: 

•  BOUNDED-PCP  =  { <P,  k>  :  P  is  an  instance  or  the  Post  Correspondence  prob¬ 
lem  that  has  a  solution  of  length  less  than  or  equal  to  A'}. 

While  BOUNDED-PCP  is  decidable  (by  a  straightforward  algorithm  that  simply  tries 
all  candidate  solutions  of  length  up  to  A),  it  appears  not  to  be  efficiently  decidable.  It  is  a 
member  of  the  complexity  class  NP-complete,  which  we  will  define  in  Section  28.2. 

The  fact  that  PCP  is  not  decidable  in  general  is  significant.  As  we  will  see  in 
Section  22.5.3.  reduction  from  PCP  is  a  convenient  way  to  show  the  undecidability  of 
other  kinds  of  problems,  including  some  that  involve  context-free  languages. 


22.3  Tiling  Problems 

Consider  a  class  of  tiles  called  Wang  tiles  or  Wang  dominos  a.  A  Wang  tile  is  a  square 
that  has  been  divided  into  four  regions  by  drawing  two  diagonal  lines,  as  shown  in 
Figure  22.1.  Each  region  is  colored  with  one  of  a  fixed  set  of  colors. 

Now  suppose  that  you  are  given  a  finite  set  of  such  tile  designs,  all  of  the  same  size,  for 
example  the  set  of  three  designs  shown  here.  Further  suppose  that  you  have  an  infinite 
supply  of  each  type  of  tile.  Then  we  may  ask  whether  or  not  it  is  possible  to  tile  an  arbi¬ 
trary  surface  in  the  plane  with  the  available  designs  while  adhering  to  the  following  rules: 


1.  Each  tile  must  be  placed  so  that  it  is  touching  its  neighbors  on  all  four  sides  (if 
such  neighbors  exist).  In  other  words,  no  gaps  or  overlaps  are  allowed. 

2.  When  two  tiles  are  placed  so  that  they  adjoin  each  other,  the  adjoining  regions  of 
the  two  tiles  must  be  the  same  color. 

3.  No  rotations  or  flips  are  allowed. 


FIGURE  22.1  A  tiling  problem. 
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EXAMPLE  22.4  A  Set  of  Tiles  that  Can  Tile  the  Plane 

'Hie  set  of  tiles  shown  in  Figure  22.1  can  be  used  to  tile  any  surface  in  the  plane. 
Here  is  a  small  piece  of  the  pattern  that  can  be  built  and  then  repeated  as  neces¬ 
sary  since  its  right  and  left  sides  match,  as  do  its  top  and  bottom: 


EXAMPLE  22.5  A  Set  of  Tiles  that  Cannot  Tile  the  Plane 

Now  consider  a  new  set  of  tiles: 


Only  a  small  number  of  small  regions  can  be  tiled  with  this  set.  To  see  this, 
start  with  tile  1,  add  a  tile  below  it  and  then  try  to  extend  the  pattern  sideways. 
Then  start  with  tile  2  and  show  the  same  thing.  Then  observe  that  tile  3,  the  only 
one  remaining,  cannot  be  placed  next  to  itself. 


We  can  formulate  the  tiling  problem,  as  we  have  just  described  it,  as  a  language  to  be 
decided. To  do  that,  we  need  to  define  a  way  to  encode  each  instance  of  the  tiling  problem 
(i.e.,  a  set  of  tile  designs)  as  a  string.  We  will  represent  each  design  as  an  ordered  4-tuple  of 
values  drawn  from  the  set  (C.W.  B}.To  describe  a  design, start  in  the  top  region  and  then 
go  around  the  tile  clockwise.  So,  for  example,  the  tile  set  of  Figure  22.1  could  be  repre¬ 
sented  as: 

(C  W  W  W)  (W  W  B  C)  (B  C  C  W). 

Now  we  can  define: 

•  TILES  =  {<T>  :  every  finite  surface  on  the  plane  can  be  tiled,  according  to  the 
rules,  with  the  tile  set  T}. 

The  string  (C  W  W  W)  (W  W  B  C)  (B  C  C  W),  which  corresponds  to  the  tile  set  of 
Example  22.4,  is  in  TILES. 
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The  siring  (G  W  W  W)  (W  W  G  G)  (B  G  B  W).  which  corresponds  lo  the  lile  set  of 
Example  22.5,  is  not  in  TILES. 

Is  TILES  in  D?  In  other  words,  does  there  exist  a  decision  procedure  that  deter¬ 
mines.  for  a  given  set  of  tiles,  whether  or  not  it  can  be  used  to  tile  an  arbitrary  surface 
in  the  plane?  Consider  the  following  conjecture,  called  Wang’s  conjecture:  If  a  given  set 
of  tiles  can  be  used  to  tile  an  arbitrary  surface,  then  it  can  always  do  so  periodically.  In 
other  words,  there  must  exist  a  finite  area  that  can  be  tiled  and  then  repealed  infinitely 
often  to  cover  any  desired  surface.  For  example,  the  tile  set  of  Example  22.4  covers  the 
plane  periodically  using  the  3  X  3  grid  shown  above.  If  Wang's  conjecture  were  true, 
then  the  tiling  question  would  be  decidable  by  considering  successively  larger  square 
grids  in  search  of  one  that  can  serve  as  the  basis  for  a  periodic  tiling.  If  such  a  grid  ex¬ 
ists,  it  will  be  found.  If  no  such  grid  exists,  then  it  is  possible  to  prove  (using  a  result 
known  as  the  Kdnig  infinity  lemma)  that  there  must  exist  a  finite  square  grid  that  can¬ 
not  be  tiled  at  all.  So  this  procedure  must  eventually  halt,  either  by  finding  a  grid  that 
can  be  used  for  a  periodic  tiling  or  by  finding  a  grid  that  cannot  be  tiled  at  all  (and  thus 
discovering  that  no  periodic  tiling  exists). 


It  is  possible  lo  make  many  kinds  of  changes  to  the  kinds  of  tiles  that  are  al¬ 
lowed  without  altering  the  undecidabilily  properties  of  the  tiling  problem  as 
we  have  presented  it  for  Wang  tiles.  Tiling  problems,  in  this  broader  sense, 
have  widespread  applications  in  the  physical  world  Q.  For  example,  the 
growth  of  crystals  can  often  be  described  as  a  tiling. 


As  it  turns  out,  Wang’s  conjecture  is  false.  There  exist  tile  sets  ^  that  can  tile  an  ar¬ 
bitrary  area  aperiodicailv  (i.e..  without  any  repeating  pattern)  but  for  which  no  periodic 
tiling  exists.  Of  course,  that  does  not  mean  that  TILES  must  not  be  in  D. There  might 
exist  some  other  way  to  decide  it.  But  there  does  not. TILES  is  not  in  D.  In  fact,  it  is  not 
even  in  SD.  although  -TILES  is. 

THEOREM  22.2  The  Undecidability  of  TILES 

Theorem:  The  language  TILES  =  \<T> :  every  finite  surface  on  the  plane  can 
be  tiled,  according  lo  the  rules,  with  the  lile  set  7  }  is  not  in  D.  It  is  also  not  in  SD. 
But  -TILES  is  in  SD. 

Proof:  We  first  prove  that  -TILES  is  in  SD.  Consider  a  search  space  defined  as 
follows:  The  start  state  contains  no  tiles.  From  any  stale  ,v.  construct  the  set  of 
successor  states,  each  of  which  is  built  by  adding  one  lile,  according  to  the  rules, 
to  the  configuration  in  v.  We  can  build  a  Turing  machine  M  that  semidecides 
-TILES  by  systematically  exploring  this  space.  If  and  only  if  it  ever  happens  that 
all  the  branches  reach  a  dead  end  in  which  there  is  no  legal  move,  then  there  is  no 
tiling  and  M  accepts. 

If  TILES  were  also  in  SD,  then,  by  Theorem  2t).b.it  would  be  in  D.  But  it  is  not. 
The  proof  that  it  is  not  is  by  reduction  from  The  idea  behind  the  reduction 
is  lo  describe  a  way  to  map  an  arbitrary  Turing  machine  M  into  a  set  of  tiles  T  in 
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such  a  way  that  T  is  in  TILES  iff  M  does  not  halt  on  e.  The  reduction  uses  a  row 
of  tiles  to  correspond  to  a  configuration  of  M.  It  begins  by  creating  a  row  that  cor¬ 
responds  to  M's  initial  configuration  when  started  on  a  blank  tape. Then  the  next 
row  will  correspond  to  M's  next  configuration,  and  so  forth.  There  is  always  a 
next  configuration  of  M  and  thus  a  next  row  in  the  tiling  iff  M  does  not  halt.  T  is 
in  TILES  iff  there  is  always  a  next  row  (i.e.,  T can  tile  an  arbitrarily  large  area). 
So  if  it  were  possible  to  semidecide  whether  T  is  in  TILES  it  would  be  possible  to 
scmidccide  whether  M  fails  to  halt  on  e.  But  we  know  (from  Theorem  21.16)  that 
-iHk  is  not  in  SD.  So  neither  is  TILES. 

The  language  TILES  corresponds  to  an  unbounded  tiling  problem.  We  can  also  for¬ 
mulate  a  bounded  version:  “Given  a  particular  stack  of  n2  tiles  (for  some  value  of  n),  is 
it  possible  to  tile  an  n  X  ;j  surface  in  the  plane?”  This  problem  is  clearly  decidable  by 
the  straightforward  algorithm  that  simply  tries  all  ways  of  placing  the  n2  tiles  on  the  n2 
cells  of  an  n  X  n  grid.  But  there  is  still  bad  news. The  theory  of  time  complexity  that  we 
will  describe  in  Chapter  28  provides  the  basis  for  formalizing  the  following  claim:  The 
bounded  tiling  problem  is  apparently  intractable.  The  time  required  to  solve  it  by  the 
best  known  algorithm  grows  exponentially  with  n.  We  will  return  to  this  discussion  in 
Exercise  28.20). 


4  Logical  Theories 

Even  before  anyone  had  seen  a  computer,  mathematicians  were  interested  in  the  ques¬ 
tion:  “Does  there  exist  an  algorithm  to  determine,  given  a  statement  in  a  logical  lan¬ 
guage.  whether  or  not  it  is  a  theorem?”  In  other  words,  can  it  be  proved  from  the 
available  axioms  plus  the  rules  of  inference.  In  the  case  of  formulas  in  first  order  logic, 
this  problem  even  had  a  specific  name,  the  Entscheidungsproblem.  With  the  advent  of 
the  computer,  the  Entscheidungsproblem  acquired  more  than  mathematical  interest.  If 
such  an  algorithm  existed,  it  could  play  a  key  role  in  programs  that,  among  other 
things: 

•  decide  whether  other  programs  are  correct 

•  determine  that  a  plan  for  controlling  a  manufacturing  robot  is  correct. 

•  accept  or  reject  interpretations  for  English  sentences,  based  on  whether  or  not  they 
make  sense. 


4.1  Boolean  Theories 

If  we  consider  only  Boolean  logic  formulas,  such  as  (Pa(Qv  -d?)-+S),  then  there 
exist  procedures  to  decide  all  of  the  following  questions: 

•  Given  a  well-formed  foimula  (wff)  w.  is  w  valid  (i.e.,  is  it  true  for  all  assignments  of 
truth  values  to  its  variables)? 

•  Given  a  wff  w,  is  w  satisfiable  (i.e.,  is  there  some  assignment  of  truth  values  to  its 
variables  such  that  w  is  true)? 

•  Given  a  wff  «>  and  a  set  of  axioms  A  is  to a  theorem  (i.e.,  can  it  be  proved  from  A)1 
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Alternatively,  all  of  the  following  languages  are  in  D. 

•  VALID  =  { <w>  :  V'  is  a  wff  in  Boolean  logic  and  t r  is  valid } 

•  SAT  =  { <«»  :  u'  is  a  wff  in  Boolean  logic  and  w  is  salisfiable } 

•  PROVABLE  =  { <A .  w>  :  w  is  a  wff  in  Boolean  logic.  A  is  a  set  of  axioms  in 
Boolean  logic  and  ip  is  provable  from  A } 


Suppose  that  the  specification  for  a  hardware  device  or  a  software  system 
can  be  described  in  terms  of  a  finite  number  of  states.  Then  it  can  be  writ¬ 
ten  as  a  Boolean  formula.  Then  one  way  to  verify  the  correctness  of  a  par¬ 
ticular  implementation  is  to  see  whether  it  satisfies  the  specification.  The 
fact  that  SAT  is  decidable  makes  this  approach,  called  model  checking, 
possible.  (H.1.2) 


There  is  a  straightforward  procedure  for  answering  all  of  these  questions  since 
each  wff  contains  only  a  finite  number  of  variables  and  each  variable  can  take  on 
one  of  two  possible  values  (True  or  False).  So  it  suffices  to  try  all  the  possibilities.  A 
wff  is  valid  iff  it  is  true  in  all  assignments  of  truth  values  to  its  variables.  It  is  satisfi- 
able  iff  it  is  true  in  at  least  one  such  assignment.  A  wff  tv  is  provable  from  A  iff 
( A  — *  w)  is  valid. 

Unfortunately,  if  w  contains  n  variables,  then  there  are  2"  ways  of  trying  all  ways  of 
assigning  values  to  those  variables.  So  any  algorithm  that  docs  that  takes  time  that 
grows  exponentially  in  the  size  of  w.The  best  known  algorithms  for  answering  any  of 
these  questions  about  an  arbitrary  wff  do  take  exponential  time  in  the  worst  case. 
We’ll  return  to  this  issue  when  we  consider  complexity  in  Part  V.  However,  we  should 
note  that  there  arc  techniques  that  can  perform  better  than  exponentially  in  many 
cases.  One  approach  (described  in  B.1.3)  represents  formulas  as  ordered  binary  deci¬ 
sion  diagrams  (OBDDs). 


22.4.2  First-Order  Logical  Theories 

If  we  consider  first-order  logic  (FOL)  sentences,  such  as  V.v ( 3v ( P(x. y)  A  Q(v,x)—* 
7(  v))),  then  none  of  the  questions  we  asked  about  Boolean  logic  (validity,  satisfiability, 
and  theoremhood)  is  decidable. 

We’ll  focus  here  on  the  question  of  deciding,  given  a  sentence  w  and  a  decid¬ 
able  set  of  axioms  A.  whether  w  can  be  proved  from  A. To  do  this,  we’ll  define  the 
language: 

•  FOL,hc„rcin  =  { <A.  ir>  :  A  is  a  decidable  set  of  axioms  in  first-order  logic,  jp'is  a 
sentence  in  first-order  logic,  and  w  is  entailed  by  A }. 

Note  that  we  do  not  require  that  the  set  of  axioms  be  finite,  but  wc  do  require  that  it 
be  decidable.  For  example  Peano  arithmetic  is  a  first-order  logical  theory  that  de¬ 
scribes  the  natural  numbers,  along  with  the  functions  plus  and  times  applied  to  them. 
Peano  arithmetic  exploits  an  infinite  but  decidable  set  of  axioms. 
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THEOREM  22.3  First-Order  Logic  is  Semidecidable 

Theorem:  FOL,hclHcm  =  {</4,ie>  :A  is  a  decidable  set  of  axioms  in  first-order 
logic.  W  is  a  sentence  in  first-order  logic,  and  w  is  entailed  by  A }  is  in  SD. 

Proof:  The  algorithm  proveFOL  semidecides  FOL,hcorcin: 
proveFOL{A :  decidable  set  of  axioms.  u’:sentence)  = 

1.  Using  some  complete  set  of  inference  rules  for  first-order  logic,  begin  with 
the  sentences  in  A  and  lexicographically  enumerate  the  sound  proofs.  If  A 
is  infinite,  then  it  will  be  necessary  to  embed  in  that  process  a  subroutine 
that  lexicographically  enumerates  the  sentences  in  the  language  of  the 
theory  of  A  and  checks  each  to  determine  whether  or  not  it  is  an  axiom. 

2.  Check  each  proof  as  it  is  created.  If  it  succeeds  in  proving  w,  halt  and 
accept. 

By  Godcl’s  Completeness  Theorem,  we  know  that  there  does  exist  a  set  of  in¬ 
ference  rules  for  first  order  logic  that  is  complete  in  the  sense  that  they  are  able 
to  derive,  from  a  set  of  axioms  A.  all  sentences  that  are  entailed  by  A.  So  step  1  of 
proveFOL  can  be  correctly  implemented. 


There  exist  techniques  for  implementing  proveFOL  in  a  way  that  is  compu¬ 
tationally  efficient  enough  for  many  practical  applications.  We  describe  one 
of  them,  resolution,  in  B.2.2. 


Unfortunately./JrovcFOL  is  not  a  decision  procedure  since  it  may  not  halt.  Also,  un¬ 
fortunately,  it  is  not  possible  to  do  better,  as  we  now  show. 


Logical  reasoning  provides  a  basis  for  many  artificial  intelligence  systems. 
Does  the  fact  that  first-order  logic  is  undecidable  mean  that  artificial  intelli¬ 
gence  is  impossible?  (M.2.4) 


THEOREM  22.4  First-Order  Logic  is  Not  Decidable 

Theorem:  FOL^.,,^  =  {</L  w>  :  A  is  a  decidable  set  of  axioms  in  first-order 
logic,  w  is  a  sentence  in  first-order  logic,  and  w  is  entailed  by  A }  is  not  in  D. 

Proof:  Let  T  be  any  first-order  theory  with  A  (a  decidable  set  of  axioms)  and  M  (an 
interpretation,  i.e.,  a  domain  and  an  assignment  of  meanings  to  the  constant,  predi¬ 
cate.  and  function  symbols  of  A ).  If  7  is  not  consistent,  then  all  sentences  are  theo¬ 
rems  in  r.  So  the  simple  procedure  that  always  returns  True  decides  whether  any 
sentence  is  a  theorem  in  T.  We  now  consider  the  case  in  which  T  is  consistent. 

If  T  is  complete  then,  for  any  sentence  w,  either  w  or ->u>  is  a  theorem.  So  the  set 
ol  theorems  is  decidable  because  it  can  be  decided  by  the  following  algorithm: 
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decidecompletetheory(A'.  set  of  axioms,  w: sentence)  — 

L  In  parallel.  use  proveFOL.  as  defined  above,  to  attempt  to  prove  »» and 

2.  One  of  the  proof  attempts  will  eventually  succeed.  If  the  attempt  to 
prove  w  succeeded,  then  return  True.  If  the  attempt  to  prove  -» w  suc¬ 
ceeded.  then  return  False. 

A  slightly  different  way  to  say  this  is  that  if  the  set  of  theorems  is  in  SD  and  the 
set  of  nontheorems  is  also  in  SD,  then  by  Theorem  20.6.  both  sets  are  also  in  D. 

But  we  must  also  consider  the  case  in  which  T  is  not  complete.  Now  it  is  possi¬ 
ble  that  neither  w  nor  -i  w  is  a  theorem.  Does  there  exist  a  decision  procedure  to 
determine  whether  w  is  a  theorem?  The  answer  is  no.  which  we  will  show  by  ex¬ 
hibiting  one  particular  theory  for  which  no  decision  procedure  exists.  We  use  the 
theory  of  Peano  arithmetic.  Godel  proved  (in  a  result  that  has  come  to  be  known 
as  Godcl's  Incompleteness  Theorem)  that  the  theory  of  Peano  arithmetic  cannot 
be  both  consistent  and  complete. 

Following  Turing's  argument,  we  show  that  H,.  FOL,|lcorcm  and  so 
FOLlhciircm  is  not  in  D.  Let  R  be  a  mapping  reduction  from  H,.  =  { <  M>  :  Turing 
machine  M  halts  on  e}  to  FOL,|Wamn  defined  as  follows: 

R(<M>)  = 

1.  From  (<M>),  construct  a  sentence  F  in  the  language  of  Peano  arith¬ 
metic.  such  that  F  is  a  theorem,  provable  given  the  axioms  of  Peano 
arithmetic,  iff  M  halts  on  e. 

2.  Let  P  be  the  axioms  of  Peano  arithmetic.  Return  <l\  F>. 

If  Oracle  exists  and  decides  FOLlhcoa.m,  then  C  -  Oracte(R(<M,w>)) 
decides  Hr: 

•  There  exists  an  algorithm  to  implement  R.  It  is  based  on  the  techniques  de¬ 
scribed  by  Hiring  (although  he  actually  proved  first  that,  because  Ht  is  unde- 
cidable.  it  is  also  undecidable  whether  a  Turing  machine  ever  prints  0.  He  then 
showed  how  to  create  a  logical  expression  that  is  a  theorem  of  Peano  arith¬ 
metic  iff  aTuring  machine  ever  prints  0).  We  omit  the  details  O. 

•  C  is  correct: 

•  If  <iW>eH,.:  M  halts  on  e.  F  is  a  theorem  of  Peano  arithmetic. 
Oracle(<P ,  F> )  accepts. 

•  If  <M>  €  Hr:  M  does  not  halt  on  e.  F  is  not  a  theorem  of  Peano  arith¬ 
metic.  Oracle(<P,  F>)  rejects. 

But  no  machine  to  decide  Hr  can  exist,  so  neither  does  Oracle. 


Is  it  decidable,  given  a  system  of  laws,  whether  some  consequence  follows 
from  those  laws?  (M.2.5) 


Keep  in  mind  that  the  fact  that  FOL,(K^lfcm  <s  undecidable  means  only  that  there  is 
no  algorithm  to  decide  whether  an  arbitrary  sentence  is  a  theorem  in  an  arbitrary 
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theory.  FOLlhvoa.n,  is  semidecidable  and  the  algorithm.  proveFOL,  that  we  described 
in  the  proof  of  Theorem  22.3,  provides  a  way  to  discover  a  proof  if  one  exists.  Al¬ 
though  efficiency  issues  arise,  we  shouldn’t  write  off  first-order  systems  as  practical 
tools,  despite  the  negative  results  that  we  have  just  shown. 

Also  note  that,  just  as  the  unsolvability  of  the  halting  problem  doesn't  say  that  there 
are  not  some  cases  in  which  we  can  show  that  a  program  halls  or  other  cases  in  which 
we  can  show  that  it  doesn't,  the  fact  that  FOLlhtwem  is  undecidable  doesn’t  prove  that 
there  are  not  some  theories  for  which  it  is  possible  to  decide  theoremhood. 

For  example,  consider  Presburger  arithmetic,  a  theory  of  the  natural  numbers  and 
the  single  function  plus  Q.The  following  is  a  theorem  of  Presburger  arithmetic  (where 
“number''  means  natural  number): 

•  The  sum  of  two  odd  numbers  is  even: 

V.v  (Vy  ((3«  (jc  =  u  +  u  +  1)  A  3u(y  =  Hw  +  1))  — *3z(x  4-  y  =  z  +  s))). 

Presburger  arithmetic  is  decidable  (although  unfortunately,  as  we  will  see  in  Section 
2K.9.3,  no  efficient  procedure  for  deciding  it  exists). 


Because  Presburger  arithmetic  is  decidable,  it  has  been  used  as  a  basis  for 
verification  systems  that  prove  the  correctness  of  programs.  We’ll  say  more 
about  program  verification  in  H.l. 


22.5  Undecidable  Problems  about  Context-Free  Languages 

Recall  from  Chapter  9  that  we  were  able  to  find  a  decision  procedure  for  all  of  the 
questions  that  we  asked  about  regular  languages.  We  have  just  seen,  at  the  other  ex¬ 
treme,  that  almost  all  the  questions  we  ask  about  Turing  machines  and  the  languages 
they  define  are  undecidable.  What  about  context-free  languages?  In  Chapter  14,  we 
described  two  decision  procedures  for  them: 

1.  Given  a  CFL  L  and  a  string  .v,  is  s  e  L? 

2.  Given  a  CFL  L,  is  L  =  0? 

What  about  other  questions  we  might  like  to  ask,  including: 

3.  Given  a  CFL  L,  is  L  =  2*? 

4.  Given  two  C’FLs  L\  and  L2.  is  L\  =  L2? 

5.  Given  two  CFLs  L\  and  L2.  is  L(  C  L2? 

6.  Given  a  CFL  L,  is  ~>L  context-free? 

7.  Given  a  CFL  L,  is  L  regular? 

8.  Given  two  CFLs  L1  and  L2,  is  L\  (T  L2  =  0? 

9.  Given  a  CFL  L.  is  L  inherently  ambiguous? 

Since  we  have  proven  that  there  exists  a  grammar  that  generates  L  iff  there  exists  a 
PDA  that  accepts  it,  these  questions  will  have  the  same  answers  whether  we  ask  them 
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abnul  grammars  or  about  PDAs.  In  addition,  there  are  questions  that  we  might  like  to 
ask  specifically  about  PDAs,  including: 

10.  Given  two  PDAs  Af |  and  Mi,  is  Mi  a  minimization  of  Aft?  Define  Mi  to  be  a 
minimization  of  A/|  iff  L(M\)  =  L(Mi)  and  there  exists  no  other  PDA  M'  such 
that  L(Mi)  =  L(M')  and  M'  has  fewer  slates  than  Af:  has. 

And  there  are  other  questions  specifically  about  grammars,  including: 

11.  Given  a  CFG  G,  is  G  ambiguous? 

Questions  3-11  are  all  undecidable.  Alternatively,  if  these  problems  are  stated  as 
languages,  the  languages  are  not  in  D.  Keep  in  mind  however  that  just  as  there  are 
programs  that  can  be  shown  to  halt  (or  not  to  hall),  there  are  context-free  languages 
about  which  various  properties  can  be  proven.  For  example,  although  question  11  is 
undecidable  (for  an  arbitrary  CFG),  some  grammars  can  easily  be  shown  to  be  am¬ 
biguous  by  finding  a  single  siring  for  which  two  parses  exist.  And  other  grammars 
can  be  shown  to  be  unambiguous,  for  example  by  showing  that  they  are  LL(1),  as 
described  in  Section  15.2.3. 

There  are  tw'o  strategies  that  we  can  use  to  show  that  these  problems  are  in  general 
undecidable.  The  first  is  to  exploit  the  idea  of  a  compulation  history  to  enable  us  to  re¬ 
duce  H  to  one  of  these  problems.  The  second  is  to  show  that  a  problem  is  not  in  D  by 
reduction  from  the  Post  Correspondence  Problem.  We  will  use  both,  starting  with  the 
computation  history  approach. 

22.5.1  Reduction  via  Computation  History 

We  will  first  show  that  question  3  is  undecidable  by  reducing  H  to  the  language 
CFG/vli.  —  { <G>  :  G  is  a  CFG  and  L(G)  =  }.  To  do  this  reduction,  we  will  have  to 

introduce  a  new  technique  in  which  we  create  strings  that  correspond  to  the  compulation 
history  ol  some  Turing  machine  Af.  But,  once  we  have  shown  that  CTGA)l  is  not  in  D.the 
proofs  of  claims  4. 5,  and  10  are  quite  straightforward.  They  use  reduction  from  CFGALL. 

Recall  from  Section  17.1  that  a  configuration  of  a  Turing  machine  Af  is  a  4  tuple 
(Afs  current  state,  the  nonblank  portion  of  the  tape  before  the  read/write  head,  the 
character  under  the  read/writc  head,  the  nonblank  portion  of  the  tape  after  the 
read/write  head). 

A  computation  of  M  is  a  sequence  of  configurations  Cj„  C| . C„  for  some  n  >  0 

such  that  C„  is  the  initial  configuration  of  A/,  C„  is  a  halting  configuration  of  Af.  and 

QjI~m  C)|—  m  Cil-  tf  •  ••  I  “A/ Or 

Notice  that,  under  this  definition,  a  computation  is  a  finite  sequence  of  configura¬ 
tions,  the  last  of  which  must  be  a  hailing  configuration.  So,  if  M  does  not  halt  when 
started  in  configuration  C„.  there  exists  no  computation  that  starts  in  C0.  That  doesn’t 
mean  that  Af  can’t  compute  from  C,>.  It  just  means  that  there  is  no  finite  sequence  that 
records  what  it  does  and  it  is  that  sequence  that  we  are  calling  a  computation. 

A  computation  history  of  Af  is  a  siring  that  encodes  a  computation.  We  will  write 
each  configuration  in  the  history  as  a  4-tuple,as  described  above.  Then  we  will  encode 
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the  entire  history  by  concatenating  the  configurations  together.  So,  assuming  that  s  is 
M's  start  state  and  h  is  a  hailing  state,  here’s  an  example  of  a  string  that  could  represent 
a  compulation  history  of  M\ 

(.r,  b.ZI  abba)(<7|,  e,  a,  bba)(r/2,  a,  b,  ba  )(q2.  ab,  b,  a)(q2,  abb,  a,t_l)(/j,  abba.OO). 

THEOREM  22.5  CFGALL  is  Undecidable  _ 

Theorem: The  language  CFGAll  =  { <G>  :  G  is  a  CFG  and  L(G)  =  2*}  is  not  in  D. 

Proof:  We  show  that  CFGALl  is  not  in  D  by  reduction  from  H  =  {<Af,  w>  • 
Turing  machine  M  halts  on  input  siring  «>}.  The  reduction  we  will  use  exploits 
two  functions,  R  and  R  will  map  instances  of  H  to  instances  of  CFGAll-  As  in 
the  proof  of  Theorem  21.9,  -<  will  simply  invert  Oracle's  response  (turning  an 
accept  into  a  reject  and  vice  versa). 

The  idea  behind  R  is  that  it  will  build  a  grammar  G  that  generates  the  lan¬ 
guage  L#  composed  of  all  strings  in  S*  except  any  that  represent  a  computation 
history  of  M  on  w.  If  M  does  not  halt  on  w,  there  are  no  computation  histories  of 
M  on  w  (since  a  computation  history  must  be  of  finite  length  and  end  in  a  halting 
state)  so  G  generates  1*  and  Oracle  will  accept.  If,  on  the  other  hand,  there  exists 
a  computation  history  of  M  on  w,  then  there  will  be  a  string  that  G  will  not  gen¬ 
erate  and  Oracle  will  reject.  So  Oracle  makes  the  correct  distinction  but  accepts 
when  we  need  it  to  reject  and  vice  versa.  But  since  Oracle  is  a  deciding  machine, 

-<  can  invert  its  response. 

It  turns  out  to  be  easier  for  R  to  build  a  PDA  to  accept  L#  than  it  is  to  build 
a  grammar  to  generate  it.  But  we  have  an  algorithm  to  build,  from  any  PDA, 
the  corresponding  grammar.  So  R  will  first  build  a  PDA  P,  then  convert  P  to  a 
grammar. 

In  order  for  a  string  s  to  be  a  computation  history  of  M  on  u\  it  must  possess 
four  properties: 

1.  It  must  be  a  syntactically  valid  compulation  history. 

2.  C()  must  correspond  to  M  being  in  its  start  state,  with  w  on  the  tape,  and 
with  the  read/write  head  positioned  just  to  the  left  of  w. 

3.  The  last  configuration  must  be  a  halting  configuration. 

4.  Each  configuration  after  C0  must  be  derivable  from  the  previous  one  ac¬ 
cording  to  the  rules  in  M's  transition  relation  8W. 

I 

We  want  P  to  accept  any  string  that  is  not  a  computation  history  of  M  on  w.  So 
if  P  finds  even  one  of  these  conditions  violated  it  will  accept.  P  will  nondetermin- 
islically  choose  which  of  the  four  conditions  to  check.  It  can  then  check  the  one  it 
picked  as  follows: 

1.  We  can  write  a  regular  expression  to  define  the  syntax  of  the  language  of 
computation  histories.  So  P  can  easily  check  for  property  1  and  accept  if 
the  string  is  ill-formed. 
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2.  R  builds  P  from  a  particular  pair,  so  il  can  hardwire  into  P  what  the  ini¬ 
tial  configuration  would  have  to  be  if  s  is  to  be  a  computation  history  of 
M  on  w, 

3.  Again.  R  can  hardwire  into  P  what  a  halting  configuration  of  M  is.  namely 
one  in  which  M  is  in  some  state  in  //w. 

4.  This  is  the  only  hard  one.  To  show  that  a  string  .v  is  not  composed  of  config¬ 
urations  that  are  derivable  from  each  other,  it  suffices  to  find  even  one  ad¬ 
jacent  pair  where  the  second  configuration  cannot  be  derived  from  the  first. 

So  P  can  nondeterministicaily  pick  one  configuration  and  then  check  to  see 
whether  the  one  that  comes  after  il  is  not  correct,  according  to  the  rules 
of  5,w. 

But  how  exactly,  can  we  implement  the  test  for  properly  4?  Suppose  that  we 
have  an  adjacent  pair  of  configurations.  If  they  are  part  of  a  compulation  history 
of  M,  then  they  must  be  identical  except: 

•  The  stale  of  the  second  must  have  changed  as  specified  in  5W. 

•  Right  around  the  read/write  head,  the  change  specified  by  8.w  must  have  oc¬ 
curred  on  the  tape. 

So,  for  example,  it  is  possible,  given  an  appropriate  8,w.  that  the  following 
siring  could  be  part  of  a  computation  history: 

(ql,  aaaa.  b,  aaaa)(q2,  aaa,  a,  baaaa). 

Here  M  moved  the  read/write  head  one  square  to  the  left.  But  it  is  not  possible 
for  the  following  string  to  be  part  of  any  computation  history: 

(ql.  aaaa,  b.  aaaa)(q2,  bbbb.  a.  bbbb). 

M  cannot  change  any  squares  other  than  the  one  directly  under  its  read/write 
head. 

So  P  must  read  the  first  configuration,  remember  il.  and  then  compare  it  to  the 
second.  Since  a  configuration  can  be  of  arbitrary  length  and  P  is  a  PDA.  the  only 
way  P  can  remember  a  configuration  is  on  the  stack.  But  then  il  has  a  problem. 
When  it  tries  to  pop  off  the  symbols  from  the  first  configuration  to  compare  it  to 
the  second,  they  will  be  backwards. 

To  solve  this  problem,  we  will  change  slightly  our  statement  of  the  language 
that  P  will  accept.  Now  il  will  be  B#.  the  boustrophedon  version  of  L#.  In  B#, 
every  odd  numbered  configuration  will  be  written  backwards.  The  word  “bous¬ 
trophedon"  aptly  describes  B#.  It  is  derived  from  a  Greek  word  that  means  turn¬ 
ing  as  oxen  do  in  plowing.  It  is  used  to  describe  a  writing  scheme  in  which 
alternate  lines  are  written  left  to  right  and  then  right  to  left  (so  that  the  scribe 
wastes  no  effort  moving  his  hand  across  the  page  without  writing).  With  this 
change.  P  can  compare  two  adjacent  configurations  and  determine  whether  one 
could  have  been  derived  from  the  other  via  5. 
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Boustrophedon  writing  0  has  been  used  in  ancient  Greek  texts  and  for  in¬ 
scriptions  on  statues  on  Easter  Island.  Much  more  recently,  dot  matrix  print¬ 
ers  used  back  and  forth  writing,  but  they  adjusted  their  fonts  so  that  it  was 
not  the  case  that  every  other  line  appeared  backwards. 


We  are  now  ready  to  state: 

R(<M .  m;>)  = 

1.  Construct  the  description  of  a  PDA  P  that  accepts  all  strings  in  B#. 

2.  From  P,  construct  a  grammar  G  that  generates  L{P). 

3.  Return  <G>. 

{/?,-«}  is  a  reduction  from  H  to  CFGALL.  If  Oracle  exists  and  decides  CFGAll> 
then  C  =  ->Oracle{R(<M,w>))  decides  H.  R  can  be  implemented  as  a  Turing 
machine.  And  C  is  correct: 

•  If  </W.  w>  eH-  M  halls  on  w.  So  there  exists  a  computation  history  of  M 
on  w.  So  there  is  a  string  that  G  does  not  generate.  Oracle(<G>)  rejects.  C 
accepts. 

•  If  <M ,  w>  f tH’-  M  does  not  halt  on  w,  so  there  exists  no  computation  histo¬ 
ry  of  M  on  w.  G  generates  S*.  Oracle(<G>)  accepts.  C  rejects. 

But  no  machine  to  decide  H  can  exist,  so  neither  does  Oracle. 


.5.2  Using  the  Undecidability  of  CFGALL 

Now  that  we  have  proven  our  first  result  about  the  undecidability  of  a  question  about 
context-free  grammars,  others  can  be  proven  by  reduction  from  it.  For  example: 


THEOREM  22.6  "Are  Two  CFGs  Equivalent?"  is  Undecidable 


Theorem:  The  language  GG=  =  {<G,,  G2>  :  G,  and  G2  are  CFGs  and  L(Gj)  = 
L(G2)}  is  not  in  D. 

Proof:  We  show  that  CFGAu_  GG=  and  so  GG=  is  not  in  D.  Let  R  be  a  mapping 
reduction  from  CFGAll  to  GG=  defined  as  follows: 

R(<G>)  = 

1.  Construct  the  description  <G#>  of  a  new  grammar  G#  that  generates  2*. 

2.  Return  <G#,  G>. 
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If  Oracle  exists  and  decides  GG =,  then  C  =  Oraclc{  R(  <  <7>))  decides  CFGall: 

•  R  can  he  implemented  as  a  Turing  machine. 

•  C  is  correct: 

•  If  <G>  e  CFGALL:  G  is  equivalent  to  GH.  which  generates  everything. 
Oracle(<G #.  G>)  accepts. 

•  If  <G>  $  CFGALl:  G  is  not  equivalent  to  GH.  which  generates  every¬ 
thing.  Oracle( < G#.  G>)  rejects. 

But  no  machine  to  decide  CFGAUl  can  exist,  so  neither  does  Oracle. 

THEOREM  22.7  "Is  One  CFL  a  Subset  of  Another?"  is  Undecidable 

Theorem:  The  language  { < G j ,  G:>  :  G(  and  G2  are  context-free  grammars  and 
L(G|)QL(G2)}  is  not  in  D. 

Proof:  The  proof  is  by  reduction  from  GG=  and  is  left  as  an  exercise. 


Tile  undecidabilily  of  so  many  questions  about  context-free  languages  makes  opti¬ 
mizing  programs  to  work  with  them  more  difficult  than  optimizing  FSMs.  For  example, 
in  Chapter  5  we  described  an  algorithm  for  minimizing  DFSMs.  But  now.  in  discussing 
context-free  languages  and  PDAs,  we  must  accept  that  the  problem  of  determining 
whether  one  PDA  is  a  minimization  of  another,  is  undecidable.  This  result  can  be 
proved  quite  easily  by  reduction  from  CFGall* 


THEOREM  22.8  "Is  One  PDA  a  Minimization  of  Another?"  is  Undecidable 

Theorem:  The  language  PDAMin  =  (<M|.  M2>:  PDA  M2  is  a  minimization  of 
PDA  M\ }  is  undecidable. 

Proof:  We  show  that  CFGA(  |  PDANMN  and  so  PDAN1IN  is  not  in  D.  Before  we 
start  the  reduction,  recall  that  \f2  is  a  minimization  of  A/t  iff: 

~  MM:))  A  is  minimal. 

Let  R  be  a  mapping  reduction  from  CFGAj x  to  PDAN1(N  defined  as  follows: 
R(<G>)  = 

1.  Invoke  cfgtoP DAu>pilown{G )  to  construct  the  description  <P>  of  a 
PDA  that  accepts  the  language  that  G  generates. 

2.  Write  <P#>  such  that  PH  is  a  PDA  with  a  single  state  s  that  is  both  the 
start  state  and  an  accepting  slate.  Make  a  transition  from  .v  back  to  itself 
on  each  input  symbol.  Never  push  anything  onto  the  slack.  Note  that 
/.(P#)  =  and  PH  is  minimal. 

3.  Return  <P.  PH>. 
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If  Oracle  exists  and  decides  PDAMIN,  then  C  =  Orncle(R(<G>))  decides 
CFGAt  i  .  R  can  be  implemented  as  a  Turing  machine.  And  C  is  correct: 

•  If  <G>  e  CFGall:L(G)  =  2*.  So  L(P)  =  2*.  Since  L(P#)=  2* 
L(P)  =  /.(/>#).  And  PH  is  minimal.  Thus  PH  is  a  minimization  of  P. 
Oraclc(<P,  PH>)  accepts. 

•  K  <G>  *  CFGalu:  L(G)  *  2*.  So  L(P)  *  2*.  But  L(PH)  =  2*.  So 
HP)  *  L(PH).  So  Oracle(<P.  PH>)  rejects. 

But  no  machine  to  decide  CFGALL  can  exist,  so  neither  does  Oracle. 


.5.3  Reductions  from  PCP 

Several  of  the  context-free  language  problems  that  we  listed  at  the  beginning  of  this 
section  can  be  shown  not  to  be  decidable  by  showing  that  the  Post  Correspondence 
Problem  is  reducible  to  them. The  key  observation  that  forms  the  basis  for  those  reduc¬ 
tions  is  the  following:  Consider  P.  a  particular  instance  of  PCP.  If  P  has  a  solution,  then 
there  is  a  sequence  of  indexes  that  makes  it  possible  to  generate  the  same  string  from 
the  X  list  and  from  the  Y  list.  If  there  isn't  a  solution,  then  there  is  no  such  sequence. 

We  start  by  defining  a  mapping  between  instances  of  PCP  and  context-free  grammars. 
Recall  that,  given  some  nonempty  alphabet  2.  an  instance  of  PCP  is  a  string  of  the  form: 

<P>  =  (.y,,  .v2,  Xy. . . . ,  JC„)(V J,  y2.  y>). . .  • , y„),  where  V;  (.v,  e  2+  and  y,  e  2+). 

To  encode  solutions  to  P.  we’ll  need  a  way  to  represent  the  integers  that  correspond 
to  the  indexes.  Since  all  the  integers  must  be  in  the  range  1  :n.  we  can  do  this  with  n 
symbols.  So  let  2„  be  a  set  of  /?  symbols  such  that  2  fl  2„  =  0.  We'll  use  the  jih  ele¬ 
ment  of  2„  to  encode  the  integer  j. 

Given  any  PCP  instance  P,  we'll  define  a  grammar  Gx  that  generates  one  string  for 
every  candidate  solution  to  P.The  string  will  have  a  second  half  that  is  the  sequence  of 
indices  that  is  the  solution,  except  that  that  sequence  will  be  reversed.  The  first  half  of 
the  string  will  be  the  concatenation  of  the  elements  from  the  X  list  that  were  selected 
by  the  indices.  So  suppose  that  .Y|  =  aaa.  .v2  =  bbc,  and  =  dd.Then  the  index  se¬ 
quence  1,  2,  3  would  produce  the  string  aaabbedd.  So  Gv  will  generate  the  string 
aaabbcdd321.  (Note  that  the  index  sequence  appears  reversed.)  We’ll  also  build  the 
grammar  Gv.  which  does  the  same  thing  for  the  sequences  that  can  be  formed  from  the 
Y  list.  Note  that  there  is  no  commitment  at  this  point  that  any  of  the  strings  generated 
by  either  G,  or  Gv  corresponds  to  a  solution  of  P.  What  we'll  see  in  a  moment  is  that  a 
siring  s  corresponds  to  such  a  solution  iff  it  is  generated  by  both  G,  and  Gv. 

More  formally,  for  any  PCP  instance  P,  define  the  following  two  grammars  Gx  and  Gv: 

•  Gy  =  ( {.S’,}  U  2  U  2„.  2  U  2„.  Pr,  Sx).  where  Rx  contains  the  following  two  rules 
for  each  value  of  /  between  1  and  n: 


SX  *  XjSyi 
Sx  x,4 


In  both  rules,  /  is  represented  by  the  i*h  element  of  2„. 


506  Chapter  22  Decidability  of  Other  Kinds  of  Languages 

•  Gy  =  ( {S,.}  USUSJ.XU  Ry,  Sv ),  where  Rv  contains  the  following  two  rules 

for  each  value  of  /'  between  1  and  it: 

Sy-'yjS>J 
Sy  \’J 

In  both  rules,  i  is  represented  by  the  i,h  element  of 

Every  siring  that  Gx  generates  will  be  of  the  form  xls . . . xh  (/,.  i2 _ t*)R.  Every 

string  that  Gv  generates  will  be  of  the  form  yif  y{ _ ii)R. 

Any  solution  to  P  is  a  finite  sequence  i\,  i2, . . .  /*  of  integers  such  that: 

Vy(l  <  ij  <  n  and .v(| xi: . . . xit  = 

If  any  such  solution  i2, . . . /*  exists,  let  w  =  .v(| ,v(, . . .  xu  =  y,y,.. . . yu.  Then  both 
Gx  and  Gv  will  generate  the  string: 

w(i,,i 2. 

EXAMPLE  22.6  Defining  Grammars  for  a  PCP  Instance 

Consider  PCP4,  defined  as  follows: 


i 

X 

Y 

1 

b 

bab 

2 

abb 

b 

3 

aba 

a 

4 

bbaaa 

babaaa 

PCP4  is  represented  as  the  string  (b,  abb.  aba.  bbaaa)(bab.  b.  a,  babaaa). 


The  rules  in  Gx 

The  rules  in  Gr 

Sx  — *  bS^l.Sj— »  bl, 

Sx—*  abbS<2, SA— *  abb2. 

Sx~*  aba  Sx3,  S,  -*  aba3 

St~*  bbaaa  S(4,  Sx  -*  bbaaa4 

Sy~*  babSvl.  Sy  -*  babl, 

Sy—*  bi\2.  Sy  -*  b2, 

Sy—*  aS>3,Sy—*  a3 

Sv—*  babaaa.S\4, 5,-»  babaaa4 

Gx  generates  strings  of  the  form  w  vR,  where  w  is  a  sequence  of  strings  from 
column  X  and  v  is  the  sequence  of  indices  that  were  used  to  form  u\  Gy  does  the 
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same  for  strings  from  column  Y.  So,  for  example,  since  1, 2,  3, 1  is  a  solution  to 
PCP4.  G,  can  generate  the  following  string  (with  blanks  inserted  to  show  the 
structure  and  with  the  index  sequence  reversed): 

b  abb  aba  b  1321 

Gy  can  also  generate  that  string,  although  it  derives  it  differently: 
bab  b  a  bab  1321 


Using  the  ideas  that  we  have  just  described,  we  are  ready  to  show  that  some  significant 
questions  about  the  context-free  languages  are  undecidable.  We’ll  do  so  by  converting 
each  question  to  a  language  and  then  exhibiting  a  reduction  from  PCP  =  {<P>  :  P  has 
a  solution )  to  that  language. 


THEOREM  22.9  "Is  the  Intersection  of  Two  CFLs  Empty?"  is  Undecidable 

Theorem: The  language  IniEmpty  =  (<G|,  G2>  :  G|  and  Gj  arc  context-free  gram¬ 
mars  and  L(G\)  fl  L(G2)  =  0}  is  not  in  D. 


Proof:  We  show  that  IniEmpty  is  not  in  D  by  reduction  from  PCP  =  {<P>  :  P  has 
a  solution The  reduction  we  will  use  exploits  two  functions,  R  and  R  will  map 
instances  of  PCP  to  instances  of  IntEmply.  As  before,  will  simply  invert 
Oracle's  response  (turning  an  Accept  into  a  Reject  and  vice  versa). 

Define  K  as  follows: 


R(<P>)  = 

1 .  From  P  construct  Gv  and  Gv  as  described  above. 

2.  Return  <GA,  Gv>. 

{R%^}  is  a  reduction  from  PCP  to  IntEmpty.  If  Oracle  exists  and  decides 
IniEmpty,  then  C  =  ->Oracle(R(< P>))  decides  PCP.  R  and  -i can  be  implemented 
as  Turing  machines.  And  C  is  correct: 


•  If  <P>  e  PCP:  P  has  at  least  one  solution.  So  both  Gx  and  Gv  will  generate 
some  string: 


M’('b  '2.  •  •  ■  '*)*  where  w  =  xt  jciV  . 


■  Xi.  = 


=  V’/, 


>'U 


So  L{G |)  D  /-(G2)  ^  0.  Oracle(<Gx.  Gv>)  rejects,  so  C  accepts. 

•  If  <P>  g  PC  P:  P  has  no  solution.  So  there  is  no  siring  that  can  be  generated 
by  both  G\  and  Gv.  So  L(G\)  IT  L(G2)  =  0.  Oracle(<Gx.  Gv>)  accepts. so  C 
rejects. 


But  no  machine  to  decide  PCP  can  exist,  so  neither  does  Oracle. 
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In  Chapter  1 1  we  spent  a  good  deal  of  time  worrying  about  whether  the  context-free 
grammars  that  we  built  were  unambiguous.  Yet  we  never  gave  an  algorithm  to  deter¬ 
mine  whether  or  not  a  context-free  grammar  was  ambiguous.  Now  we  can  understand 
why.  No  such  algorithm  exists. 


THEOREM  22.10  "Is  a  CFG  Ambiguous?"  is  Undecidable 

Theorem:  The  language  CFGl)NAMIl|(i  =  { <G>  :  G  is  u  context-free  grammar  and 
G  is  ambiguous }  is  not  in  D. 

Proof:  We  show  (hat  PCP  S m  CFCiunamric.  and  so  CFG|i^A^tuKi  is  not  in  D.  Let  R 
be  a  mapping  reduction  from  PCP  to  CFGL.NAMUI(;  defined  as  follows: 

R{<P>)  = 

1.  From  P  construct  Gx  and  G>,  as  described  above. 

2.  Construct  G  as  follows: 

2.1.  Add  to  G  all  the  symbols  and  rules  of  both  Gx  and  Gv. 

2.2.  Add  a  new  start  symbol  S  and  the  two  rules  .S'  — *  Sx  and  5  — *  Sy 

3.  Return  <G>. 

G  generates  L(G|)  U  L(G2).  Further,  it  does  so  by  generating  all  the  deriva¬ 
tions  that  G|  can  produce  as  well  as  all  the  ones  that  G\  can  produce,  except  that 
each  has  a  prepended  S  =*  Sx  or  S  =>  Sy. 

If  Oracle  exists  and  decides  CFGt  NAMmo,  then  C  =  Oracle(R{<P>))  de¬ 
cides  PCP.  R  can  be  implemented  as  a  Turing  machine.  And  C  is  correct: 

•  If  <P>  e  PCP  P  has  at  least  one  solution.  So  both  Gk  and  Gv  will  generate 
some  string: 

w(i ,t i*2, . . .  /*)*,  where  w  -  x,xh . . .  x,t  =  y(,y(. . . .  vu 

So  G  can  generate  that  string  in  two  different  ways.  G  is  ambiguous. 
Oracle(  <  G> )  accepts. 

•  If  <P>  $  PCP:  P  has  no  solution.  So  there  is  no  string  that  can  be  generated 
by  both  Gx  and  Gv.  Since  both  Gx  and  G,  are  unambiguous,  so  is  G. 
Oraclei  <G>)  rejects. 

But  no  machine  to  decide  PCP  can  exist,  so  neither  does  Oracle. 


Exercises 

1.  Solve  the  linear  Diophantine  farmer  problem  presented  in  Section  22.1, 

2.  Consider  the  following  instance  of  the  Post  Correspondence  Problem.  Does  it 
have  a  solution?  If  so.  show  one. 
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i 

X 

Y 

1 

a 

bab 

2 

bbb 

bb 

3 

aab 

ab 

4 

b 

a 

3.  Prove  thal,  if  we  consider  only  PCP  instances  with  a  single  character  alphabet, 
PCP  is  decidable. 

4.  Prove  that,  if  an  instance  of  the  Post  Correspondence  Problem  has  a  solution,  it 
has  an  infinite  number  of  solutions. 

5.  Recall  thal  the  size  of  an  instance  P  of  the  Post  Correspondence  Problem  is  the 
number  of  strings  in  its  X  list.  Consider  the  following  claim  about  the  Post  Corre¬ 
spondence  problem:  For  any  n,  if  P  is  a  PCP  instance  of  size  n  and  if  no  string  in 
either  its  X  or  its  Y  list  is  longer  than  n,  then,  if  P  has  any  solutions,  it  has  one  of 
length  less  than  or  equal  to  2".  Is  this  claim  true  or  false?  Prove  your  answer. 

6.  Let  TI LES  =  { <  T>  :  any  finite  surface  on  the  plane  can  be  tiled,  according  to  the 
rules  described  in  the  book,  with  the  tile  set  T}.  Let  s  be  the  string  that  encodes  the 
following  tile  set: 

Is  .s  e  TILES?  Prove  your  answer. 

7.  For  each  of  the  following  languages  L,  state  whether  or  not  it  is  in  D  and  prove 
your  answer. 

a.  {<G>  :  G  is  a  context-free  grammar  and  ee  L(G)}. 

b.  {<G>  :  G  is  a  context-free  grammar  and  {e}  =  L(G)}. 

c.  {<G|.  C2>  :  G\  and  C2  are  context-free  grammars  and  L(G,)C  L(G2)}. 

d.  {<G>  :  G  is  a  context-free  grammar  and  ->L(G)  is  context  free}. 

e.  { <G>  :  G  is  a  context-free  grammar  and  L(G)  is  regular}. 


CHAPTER 
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Unrestricted  Grammars* 


Consider  a  language  like  A"BnCn  =  {a"b''c":/i  ^  ()}.  We  know  that  we  cannot 
write  a  conlext-free  grammar  for  it.  Bui  could  we  create  a  new  grammar  formal¬ 
ism  that  is  powerful  enough  to  describe  it  and  other  languages  like  it?  The  an¬ 
swer  to  this  question  is  yes.  Recall  that  we  moved  from  the  power  to  define  the  regular 
languages  to  the  power  to  define  the  context-free  languages  by  removing  constraints 
on  the  form  of  the  rules  that  are  allowed.  We  will  do  that  again  now.  This  time  we  will 
remove  all  constraints.  We  will  prove  that  the  class  of  languages  that  can  be  generated 
by  one  of  these  new.  unrestricted  grammars  is  exactly  SD. 


23.1  Definition  and  Examples 

An  unrestricted  grammar  G  is  a  quadruple  ( V .  R.  S),  w  here: 

•  V  is  an  alphabet  that  may  contain  terminal  and  nonterminal  symbols. 

•  (the  set  of  terminals)  is  a  subset  of  V. 

•  R  (the  set  of  rules)  is  a  finite  subset  of  V '  x  V/*. 

•  S  (the  start  symbol)  is  an  element  of  V  -  £. 

Note  that  now  the  right-hand  side  of  a  rule  may  contain  multiple  symbols.  So  we 
might,  have,  for  example: 

a  X  a  -*  a  a  a 
bA'b  — *  a  b  a 

In  this  case,  the  derivation  of  X  depends  on  its  context.  It  is  thus  common  to  call 
rules  like  this  “context-sensitive."  We  will  avoid  using  this  terminology,  however,  be¬ 
cause  in  the  next  chapter  we  will  describe  another  formalism  that  we  will  call  a  context- 
sensitive  grammar.  While  it.  too.  allows  rules  such  as  these,  it  does  impose  one 
important  constraint  that  is  lacking  in  the  definition  of  an  unrestricted  grammar.  It  is 
thus  less  powerful,  in  a  formal  sense,  than  the  system  that  we  are  describing  here. 
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An  unrestricted  grammar  G  (just  like  a  context-free  grammar)  derives  strings  by  apply¬ 
ing  rules,  beginning  with  its  start  symbol.  So,  to  describe  its  behavior,  we  define  the  derives- 
in-one-step  relation  (=* )  analogously  to  the  way  we  defined  it  for  context-free  grammars. 
Given  a  grammar  G  =  (V\  2,  R,  S),  define  .v  =*(;  y  to  be  a  binary  relation  such  that: 

V.v.  y  e  V*(x  =>(j  y  iff  .v  =  a/3t£, 

y  = 

a.  and  y  e  V*, 

/3  g  V+,  and 

there  is  a  rule  /3  — » 7  in  R). 

Any  sequence  of  the  form  w{)  =>f;  W\  =>c  u>z  wn 's  called  a  derivation  in 

G.  As  before,  =>(;*  is  the  reflexive,  transitive  closure  of  =>G. 

The  language  generated  by  G  is  {we  :  S  =*r,  *w}.  So,  just  as  before,  L{G)  is  the 
set  of  all  strings  of  terminal  symbols  derivable  from  S  via  the  rules  of  G. 

Unrestricted  grammars  are  sometimes  called  phrase  structure  grammars  or  type  0 
grammars ,  the  latter  because  of  their  place  in  the  Chomsky  hierarchy  (which  we  will 
describe  in  Section  24.2).  Some  books  also  use  the  term  semi-Thue  system  synony¬ 
mously  with  unrestricted  grammar.  While  the  two  formalisms  are  very  similar,  they 
model  different  computational  processes  and  so  must  be  considered  separately.  We  will 
describe  semi-Thue  systems  in  Section  23.5. 


EXAMPLE  23.1  AnBnCn 

Consider  AnBnCn  =  {a"b"c" :  n  >  0}.  We  build  a  grammar  G  =  (V\  {a,  b,  c}, 
R,  S),  where  V  and  R  are  as  described  below  and  L{G)  —  AnBnCn.  We  first  ob¬ 
serve  that  any  grammar  for  AnBnCn  must  generate  all  and  only  those  strings 
with  two  properties: 

•  equal  numbers  of  a’s,  b's,  and  c's,  and 

•  letters  in  the  correct  order. 

Just  as  with  context-free  grammars,  the  only  way  to  guarantee  that  there  are 
equal  numbers  of  a  s,  b's,  and  c’s  is  to  generate  them  in  parallel.  The  problem, 
though,  is  that  there  is  no  way  to  generate  them  in  the  correct  order.  For  example, 
we  could  try  a  rule  like: 

S-+  abSc 

But  if  we  apply  that  rule  twice,  we  will  generate  the  string  ababScc.  So  what  we 
will  have  to  do  is  to  generate  each  siring  in  two  phases: 

1.  Generate  the  correct  number  of  each  symbol. 

2.  Move  the  symbols  around  until  they  are  in  the  correct  order.  This  is  the 
step  that  is  possible  in  an  unrestricted  grammar  but  was  not  possible  in  a 
context-free  one. 
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But  we  must  be  careful.  As  soon  as  G  has  generated  a  siring  that  contains  only 
terminal  symbols,  it  is  done,  and  that  string  is  in  L(G).  So  we  must  make  sure  that, 
until  the  string  is  ready,  it  still  contains  at  least  one  nonterminal.  We  can  do  that  by 
creating  one  or  more  nonterminals  that  will  stand  in  for  their  corresponding  ter¬ 
minals  and  will  be  replaced  once  they  have  been  moved  into  position.  We'll  use 
one  such  symbol,  fl. 

We  begin  building  the  rules  of  G  by  inserting  into  R  two  rules  that  will  generate 
strings  with  equal  numbers  of  as. b's, and  c's: 

1.  S  — >  a  BSc 

2. 

To  generate  the  siring  a'b'c'.  we  will  apply  rule  1  /  times. Then  we  will  apply  rule 
2  once.  Suppose  we  want  to  generate  a'b’c1.  Then  we  will  apply  rule  1  three  times, 
then  rule  2  once,  and  we  will  generate  aSaBaBccc.  Because  of  the  nonterminal 
symbol  B,  this  string  is  not  an  element  of  L(G).  We  still  have  the  opportunity  to 
rearrange  the  symbols,  which  we  can  do  by  adding  one  swapping  rule: 

3.  Ba  — »  a B 

Rule  3  can  be  applied  as  many  times  as  necessary  to  push  all  the  a‘s  to  the  front 
of  the  string.  But  what  form  it  to  be  applied?  The  answer  is  the  B' s.  Until  all  the 
B's  are  gone,  the  string  that  has  been  generated  is  not  in  L(G).  So  we  need  rules  to 
transform  each  B  into  b.  We  must  design  those  rules  so  that  they  cannot  be  applied 
to  any  B  until  it  is  where  it  belongs.  We  can  assure  that  with  the  following  two  rules: 

4.  Be  — *  be 

5.  Bb  — *■  bb 

Rule  4  transforms  the  rightmost  B  into  b.  Rule  5  transforms  a  B  if  it  has  an  al¬ 
ready  transformed  b  directly  to  its  right. 


Having  written  a  grammar  such  as  the  one  we  just  built  for  A"B’’Cn,  can  wc  prove 
that  it  is  correct  (i.e„  that  it  generates  exactly  the  strings  in  the  target  language?)  Yes. 
Just  us  with  a  context-free  grammar,  we  can  prove  that  a  grammar  G  is  correct  bv: 

1.  showing  that  G  generates  only  strings  in  T.  and 

2.  showing  that  G  generates  all  the  strings  in  L. 

We  show  1  by  defining  an  invariant  /  that  is  true  of  S  and  is  maintained  each  time  a 
rule  in  G  is  fired.  Call  the  string  that  is  being  derived  vf.Theti  to  prove  the  correctness 
of  the  grammar  we  just  showed  for  we  let  /  be: 

#a(i7)  =  #b(vf)  +  #»(>/)  =  #c (si)  A  all  c's  occur  to  the  right  of  all  a's.  b’s, 
and  B's  A  all  b's  occur  together  and  immediately  to  the  left  of  the  c  region. 
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Wc  show  2  by  induction  on  n.  Both  of  these  steps  are  straightforward  in  this  simple 
case.  The  same  general  strategy  works  for  other  unrestricted  grammars,  but  it  may  be 
substantially  more  difficult  to  implement. 

Next  we  consider  another  two  examples  of  unrestricted  grammars  so  that  we  can  get 
a  belter  idea  of  how  they  work. 


EXAMPLE  23.2  Equal  Numbers  of  a's,  b's  and  c's 

Let  L  =  {me  {a.b.c}* :  #a(w)  =  #b(7f>)  =  #c(«’)}.  We  build  a  grammar  G  =  ( V , 
{a. b,  c},  /?.  5),  where  V  and  R  are  as  described  below  and  L{G)  =  L.  L  is  simi¬ 
lar  to  A"BnC’  except  that  the  letters  may  occur  in  any  order.  So,  again,  we  will 
begin  by  generating  matched  sets  of  a\  b's,  and  c’s.  But,  this  time,  we  need  to 
allow  the  second  phase  of  the  grammar  to  perform  arbitrary  permutations  of  the 
characters.  So  we  will  start  with  the  rules: 

1.  S-+ABSC 

2.  S  *  e 

Next  we  need  to  allow  arbitrary  permutations, so  we  add  the  rules: 

3.  AB—*BA 

4.  BA  —*  AB 

5.  AC  —*CA 

6.  CA->AC 

7.  BC—*CB 

8.  CB  — «  BC 

Finally,  we  need  to  generate  terminal  symbols.  Remember  that  every  rule  in  G 
must  have  at  least  one  nonterminal  on  its  left-hand  side.  In  contrast  to  AnBnCn, 
here  the  job  of  the  nonterminals  is  not  to  force  reordering  but  to  enable  it.  This 
means  that  a  nonterminal  symbol  can  be  replaced  by  its  corresponding  terminal 
symbol  at  any  time.  So  we  add  the  rules: 

9.  A—>  a 

10.  B  ->  b 

11.  C—  c 


EXAMPLE  23.3  WW 

Consider  WW  =  { ww :  u> e  { a. b } * } .  We  build  a  grammar  G  =  (V,  {a,b},  R,  5), 
where  V  and  R  are  as  described  below  and  L(G)  =  WW.The  strategy  we  will  use 
is  the  following: 
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1.  Generate  wC  wR#f  where  C'  will  serve  as  a  temporary  middle 
marker  and  #  will  serve  as  a  temporary  right  boundary.  This  is  easy 
to  do  with  a  context-free  grammar. 

2.  Reverse  wR.  We  will  do  that  by  viewing  #  as  a  wall  and  jumping  the 
characters  in  ?eR,  leftmost  first ,  over  the  wall. 

3.  Finally,  clean  up  by  removing  C  and  #. 

Suppose  that, after  step  l,we  have  aabC'baa#.We  let  C spawn  a  pusher 
P,  yielding: 

aabCTbaa#. 

The  job  of  P  is  to  push  the  character  just  to  its  right  rightward  to  the 
wall  so  that  it  can  hop  over.  To  do  this,  we  will  write  rules  that  swap  P b 
with  each  character  between  it  and  the  wall. Those  rules  will  generate  the 
following  sequence  of  strings: 

aabCa/'ba# 

aabC'aa/’b# 

The  last  step  in  getting  the  pushed  character  (in  this  case,  b)  where  it 
belongs  is  to  jump  it  over  the  wall  and  then  erase  P,  yielding: 

aabCaa#b 

Next,  C  will  spawn  another  pusher  P  and  use  it  to  push  the  first  a  up  to 
the  wall.  At  that  point,  we  will  have: 

aabCa/,a#b 

Then  a  jumps  the  wall,  landing  immediately  after  it,  yielding: 

aabCa#ab 

Notice  that  the  substring  ba  has  now  become  ab.  as  required.  Now  the 
remaining  a  can  be  pushed  and  jumped,  yielding: 

aabC#aab 

The  final  step  is  to  erase  C#  as  soon  as  they  become  adjacent  to  each 
other. 

The  following  set  of  rules  R  implements  this  plan: 


5  — r# 

1*  Generate  the  wall  exactly  once. 

a 7a 

/*  Generate  trCwR. 

T  — *  bTb 

n 

7  — C 

M 

C  —  CP 

1*  Generate  a  pusher  P. 

23.1 
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Paa  —*■  aPa 

Pab  —*■  bPa 
Pba  —  aPb 
Pbb  -*  bPb 
Pa#  -*•  #a 
Pb#  -►  #b 
C#  — *e 

We  have  described  the  way  that  we  want  G  to  work.  It  clearly  can  do  exactly 
what  we  have  said  it  will  do.  But  can  we  be  sure  that  it  does  nothing  else?  Remem¬ 
ber  that,  at  any  point,  any  rule  whose  left-hand  side  matches  can  be  applied.  So, 
for  example,  what  prevents  C  from  spawning  a  new  pusher  P  before  the  first  char¬ 
acter  has  jumped  the  wall?  Nothing.  But  the  correctness  of  G  is  not  affected  by 
this.  Pushers  (and  the  characters  they  are  pushing)  cannot  jump  over  each  other. 
If  C  spawns  more  pushers  than  are  necessary,  the  resulting  string  cannot  be  trans¬ 
formed  into  a  string  containing  just  terminal  symbols.  So  any  path  that  does  that 
dies  without  generating  any  strings  in  L(G). 

If  we  want  to  decrease  the  number  of  dead-end  paths  that  a  grammar  G  can 
generate,  we  can  write  rules  that  have  more  restrictive  left-hand  sides.  So,  for  ex¬ 
ample,  we  could  replace  the  rule: 

C  — *  CP  /*  Generate  a  pusher  P. 

with  the  rules: 

Ca  — *  CPa  /*  Generate  a  pusher  P. 

Cb  —  CPb 

Now  C  can  only  generate  one  pusher  for  each  character  in  wR. 


I*  Push  one  character  to  the  right  to  get  ready  to 
jump. 


I*  Hop  a  character  over  the  wall. 


Unrestricted  grammars  often  have  a  strong  procedural  feel  that  is  typically  absent 
from  restricted  grammars.  Derivations  usually  proceed  in  phases.  When  we  design  a 
grammar  Cl.  we  make  sure  that  the  phases  work  properly  by  using  nonterminals  as 
flags  that  tell  G  what  phase  it  is  in.  It  is  very  common  to  have  three  phases: 

•  Generate  the  right  number  of  the  various  symbols. 

•  Move  them  around  to  get  them  in  the  right  order. 

•  Clean  up. 

In  implementing  these  phases,  there  are  some  (juite  common  idioms: 

•  Begin  by  creating  a  left  wall,  a  right  wall,  or  both. 

•  Reverse  a  substring  by  pushing  the  characters,  one  at  a  time,  across  the  wall  at  the 
opposite  end  from  where  the  first  character  originated. 
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•  Use  nonterminals  to  represent  terminals  that  need  additional  processing  (such  as 

shifting  from  one  place  to  another)  before  the  final  siring  should  be  generated. 

Now  that  we  have  seen  the  extent  to  which  unrestricted  grammars  can  feel  like  pro¬ 
grams.  it  may  come  as  no  surprise  that  they  are  general  purpose  computing  devices.  In 
Section  23.3  we  will  show  how  they  can  be  used  not  just  to  define  languages  but  also  to 
compute  functions.  But  first  we  will  show  that  the  class  of  languages  that  can  be  gener¬ 
ated  by  an  unrestricted  grammar  is  exactly  SD.  So.  sadly,  although  these  grammars  can 
be  used  to  define  decidable  languages  like  A"BnC",  unfortunately  there  is  no  parsing 
algorithm  for  them.  Given  an  unrestricted  grammar  G  and  a  siring  tr.it  is  undecidable 
whether  G  generates  w.  We  will  prove  that  in  Section  23.2. 


23.2  Equivalence  of  Unrestricted  Grammars 
and  Turing  Machines 

Recall  that,  in  our  discussion  of  the  Church-Turing  thesis,  wc  mentioned  several  for¬ 
malisms  that  can  be  shown  to  be  equivalent  to  Turing  machines.  We  can  now  add  unre¬ 
stricted  grammars  to  our  list. 


Since  rewrite  systems  can  have  the  same  computational  power  as  the  Tur¬ 
ing  machine,  they  have  been  used  to  define  programming  languages  such  as 
Prolog.  (M.2.3) 


THEOREM  23.1  Turing  Machines  and  Unrestricted  Grammars  Describe  the 
Same  Class  of  Languages 

Theorem:  A  language  L  is  generated  by  an  unrestricted  grammar  iff  it  is  semidecided 
by  some  Turing  machine  M. 

Proof:  We  will  prove  each  direction  of  this  claim  separately: 

a.  We  show  that  the  existence  of  an  unrestricted  grammar  G  for  L  implies  the  ex¬ 
istence  of  a  scmidcciding  Turing  machine  for  L  We  do  this  by  construction  of 
a  nondeterministic  Turing  machine  that,  on  input  x,  simulates  applying  the 
rules  of  G,  checking  at  each  step  to  see  whether  G  has  generated  x. 

b.  We  show  that  the  existence  of  a  scmidcciding  Turing  machine  M  for  L  implies 
the  existence  of  an  unrestricted  grammar  for  /..We  do  this  by  construction  of 
a  grammar  that  mimics  the  execution  of  M. 

Proof  of  Claim  a:  Given  an  unrestricted  grammar  G  -  {V.  R.  .S'),  we  construct  a 
nondeterministic  Turing  machine  M  that  scmidecides  L(G).  The  idea  is  that  M 
on  input  x.  will  start  with  S.  apply  rules  in  R .  and  see  whether  it  can  generate  x.  If 
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it  ever  does,  it  will  halt  and  accept.  M  will  be  nondeterministic  so  that  it  can  try 
all  possible  derivations  in  G.  Each  nondeterministic  branch  will  use  two  tapes: 

•  Tape  1  holds  M's  input  string  x. 

•  Tape  2  holds  the  string  that  has  so  far  been  derived  using  G. 

At  each  step,  M  nondeterministically  chooses  a  rule  to  try  to  apply  and  a  posi¬ 
tion  on  tape  2  to  start  looking  for  the  left-hand  side  of  the  rule.  If  the  rule's  left- 
hand  side  matches,  M  applies  the  rule. Then  it  checks  whether  tape  2  equals  tape  1. 
If  any  such  hranch  succeeds  in  generating  .v,  M  accepts.  Otherwise,  it  keeps  looking. 
If  a  branch  generates  a  string  to  which  no  rules  in  R  can  apply,  it  rejects.  So  some 
branch  of  M  accepts  iff  there  is  a  derivation  of  x  in  G.  Thus  M  semidecides  L(G). 

Proof  of  Claim  b:  Given  a  semideciding Turing  machine  M  =  (/C,  2,  F,  6, 5,  H ),  we 
construct  an  unrestricted  grammar  G  =  ({#,  q.  0,  1,  A}  U  T.  2,  R.  S)  such  that 
L(G)  =■  Z,(JV/).The  idea  is  that  G  will  exploit  a  generatc-and-test  strategy  in  which 
it  first  creates  candidate  strings  in  2*  and  then  simulates  running  M  on  them.  If 
there  is  sonic  siring  s  that  M  would  accept,  then  G  will  cleanup  its  working  symbols 
and  generate  s.  G  operates  in  three  phases: 

•  Phase  1  can  generate  all  strings  of  the  following  form,  where  qs  is  the  binary 
encoding  in  <  M>  of  M's  start  state,  n  is  any  positive  integer,  and  each  of  the 
characters  a\  through  a„  is  in  2*: 


#  □  □  qs  □  # 

The  #'s  will  enable  G  to  exploit  rules  that  need  to  locate  the  beginning  and  the 
end  of  the  string  that  it  has  derived  so  that  they  can  scan  the  string.  The  rest  of 
the  string  directly  encodes  M's  slate  and  the  contents  of  its  tape  (with  each  tape 
symbol  duplicated).  It  also  encodes  the  position  of  M's  read/write  head  by  plac¬ 
ing  the  encoding  of  the  state  immediately  to  the  left  of  the  character  under  the 
read/wrile  head.  So  the  strings  that  are  generated  in  Phase  1  can  be  used  in 
Phase  2  to  begin  simulating  M. starting  in  state  q.v,with  the  string  a \tha3  •  •  ■  an  on 
its  tape  and  the  read/write  head  positioned  immediately  to  the  left  of  flj.  Each 
character  on  the  tape  is  duplicated  so  that  G  can  use  the  second  instance  as 
though  it  were  on  the  tape,  writing  on  top  of  it  as  necessary, Then,  if  M  accepts,  G 
will  use  the  First  instance  to  reconstruct  the  input  string  that  was  accepted. 

•  Phase  2  simulates  the  execution  of  M  on  a  particular  string  w.  So,  for  example, 
suppose  that  Phase  1  generated  the  following  string: 

#GCIq000  a  a  b  b  c  c  b  b  a  aQ3# 

Then  Phase  2  begins  simulating  M  on  the  string  abcba.The  rules  of  G  are  con¬ 
structed  from  SM.  At  some  point,  G  might  generate 

#JJalb2ccb4  qOll  a  3Qa# 

if  M ,  when  invoked  on  abeba  could  be  in  state  3.  with  its  tape  equal  to  12c43, 
and  its  read/wrile  head  positioned  on  top  of  the  final  3. 
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To  implement  Phase  2.  <7  contains  one  or  more  rules  for  each  element  of  5.w 
The  rules  look  like  these: 

qlOO  b  b— *  b  2  qlOl  /*  If  M.  in  state  4  looking  at  b.  would  rewrite 

it  as  2.  go  to  stale  5.  and  move  right. 

a  a  qOll  b  4— »  qOll  a  a  b  4  /*  If  M.  in  slate  3  looking  at  4 ,  would 

rewrite  it  as  4.  go  to  slate  3.  and  move  left. 
Notice  that  to  encode  moving  left  we 
must  create  a  separate  rule  for  each  pair 
of  characters  that  could  he  to  the  left  of 
the  read/write  head. 

In  Phase  2.  all  of  M’s  states  are  encoded  in  their  standard  binary  form  except 
its  accepting  state(s).  all  of  which  will  be  encoded  as  A. 

•  Phase  3  cleans  up  by  erasing  all  the  working  symbols  if  M  ever  reaches  an  ac¬ 
cepting  state.  It  will  leave  in  its  derived  siring  only  those  symbols  correspon¬ 
ding  to  the  original  siring  that  M  accepted.  Once  the  derived  string  contains 
only  terminal  symbols,  G  halts  and  outputs  the  siring. 

To  implement  Phase  3,  G  contains  one  rule  of  the  following  form  for  each 
character  other  than  A  and  #  in  its  alphabet: 

.v  A  — *  A  x  I*  If  M  ever  reaches  an  accepting  state, 

sweep  A  all  the  way  to  the  left  of  the  string 
until  it  is  next  to  #. 

It  also  has  one  rule  of  the  following  form  for  each  pair  x, y  of  characters  other 
than  A  and  #: 

#A .r y  -♦or  #A  /*  Sweep  #A  rightward. deleting  the  working 

copy  ol  each  symbol  and  keeping  the  orig¬ 
inal  version. 

And  then  it  has  the  final  rule: 

#A#  —*e  I*  At  the  end  of  the  sweep,  wipe  out  the  last 

working  symbols. 


23.3  Grammars  Compute  Functions 

We  have  now  shown  that  grammars  and  Turing  machines  are  equivalent  in  their  power 
to  define  languages.  But  we  also  know  that  Turing  machines  can  do  something  else: 
They  can  compute  functions  by  leaving  a  meaningful  result  on  their  tape  when  they 
halt.  Can  unrestricted  grammars  do  that  as  well?  The  answer  is  yes.  Suppose  that,  in¬ 
stead  of  starting  with  just  the  start  symbol  .S',  we  allow  a  grammar  (i  to  be  invoked  with 
some  input  siring  w.  G  would  apply  its  rules  as  usual.  If  it  then  halted,  having  derived 
some  new  string  w‘  that  is  the  result  of  applying  function  /  to  ir.  we  could  say  that  G 
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computed/.  The  only  details  we  need  to  work  out  are  how  to  formal  the  input  so  G  will 
be  able  to  work  effectively  and  how  to  tell  when  G  should  halt. 

We  say  that  a  grammar  G  computes  f  iff.  Vw,  t>  e  1*(Su'S  =>*  v  «-*  v  =  f(w)).  We 
use  G's  start  symbol  S  to  solve  both  of  the  problems  we  just  mentioned:  S  serves  as  a 
delimiter  for  the  input  string  so  that  G  will  be  able  to  perform  actions  like.  “Start  at  the 
left-hand  end  of  the  string  and  move  rightward  doing  something”.  And.  as  usual,  G 
continues  to  apply  rules  until  cither  there  are  no  more  rules  that  can  be  applied  or  the 
derived  string,  is  composed  entirely  of  terminal  symbols.  So,  to  halt  and  report  a  result, 
G  must  continue  until  both  delimiting  S's  are  removed.  A  function  /  is  called 
grammatically  computable  iff  there  is  a  grammar  G  that  computes  it. 

Recall  the  family  of  functions.  valuek(n).  For  any  positive  integer  k ,  value  k(n)  returns 
the  natural  number  that  is  encoded,  base  k,  by  the  string  n.  For  example  value2(  101)  =  5. 


EXAMPLE  23.4  Computing  the  Successor  Function  in  Unary 

Let  /  be  the  successor  function  succ(n)  on  the  unary  representations  of  natural 
numbers.  Specifically,  define  f(n)  =  m,  where  value\\m)  =  value\(_n)  +  1.  If  G  is 
to  compute  /,  it  will  need  to  produce  derivations  such  as: 

SIS  =**  11 
S111LS  =>*  11111 

We  need  to  design  G  so  that  it  adds  exactly  one  more  1  to  the  input  string  and 
gets  rid  of  both  S's.  The  following  two-rule  grammar  G  =  ({S.  1},  {1},  R,  S)  does 
that  with  R  = 

Sll  — *  1S1  I*  Move  the  first  S  rightward  past  all  but  the  last  L 

SIS  —*  11  /*  When  it  reaches  the  last  1,  add  a  1  and  remove  the  S's. 


EXAMPLE  23.5  Multiplying  by  2  in  Unary 

Let  /(/i)  =  m,  where  value^m)  =  2-value^n).  G  should  produce  derivations 
such  as: 

S11S  =>*  1111 
S1111S  =>*  llllllli 

G  needs  to  go  through  its  input  string,  turning  every  1  into  1L  Again,  a  simple 
two-rule  grammar  G  =  ({S,  1},  {1},  R ,  S)  is  sufficient,  with  R  = 

SI  -»  11S  /*  Starting  from  the  left,  duplicate  a  1.  Shift  the  initial  S  so 

that  only  the  1  s  that  still  need  to  be  duplicated  are  on  its 
right. 

SS  *  R  I*  When  all  l's  have  been  duplicated,  stop. 


520  Chapter  23  Unrestricted  Grammars 


EXAMPLE  23.6  Squeezing  Out  Extra  Blanks 

Let  /(r.  x  e  {a,  b,  □}*)  =  x  except  that  extra  blanks  will  be  squeezed  out.  More 
specifically,  blanks  will  be  removed  so  that  there  is  never  more  than  one  □  be¬ 
tween  “words”,  i.e.,  sequences  of  a’s  and  b's.  and  there  are  no  leading  or  trailing 
□'5}.  G  should  produce  derivations  such  as: 

SaaObOOaaOOOObO  S  =>*  aa  Jb  Jaa  Jb 
50aaObOOaaObJJ5  =>*  aa Jb Jaa Jb 

This  time,  G  is  more  complex.  It  must  reduce  every  □□  string  to  □.  And  it  must 
get  rid  of  all  O's  that  occur  adjacent  to  either  5. 

G  =  ({5, 7,  a,b,0},  {a,b,Q},  R,  S),  where  R  = 


sa—s 

/*  Get  rid  of  leading  O’s.  All  blanks  get  squeezed,  not  just 
repeated  ones. 

55 ->e 

/*  In  case  there  are  no  nonblank  characters. 

5a  — *  aT 

/*  7  replaces  5  to  indicate  that  we  are  no  longer  in  a  lead¬ 
ing  O’s  region.  For  the  rest  of  G's  operation,  all  charac¬ 
ters  to  the  left  of  7  will  be  correct.  Those  to  the  right 
still  need  to  be  processed. 

5b  ->  b7 

7a  -*■  a  7 

/*  Sweep  7  across  a's  and  b's. 

Vo-*  b T 

ro  o — ro 

1*  Squeeze  repeated  O's. 

TOa  — ►□aT 

/*  Once  there  is  a  single  □,  sweep  7  past  it  and  the  first 
letter  after  it. 

70b  —  Ob7 

W 

705— 

/*  The  700  rule  will  get  rid  of  all  but  possibly  one  □  at 
the  end  of  the  siring. 

75— *  e 

/*  If  there  were  no  trailing  O’s,  this  rule  finishes  up. 

From  this  last  example,  it  is  easy  to  see  how  we  can  construct  a  grammar  G  to  com¬ 
pute  some  function  /.  G  can  work  in  very  much  the  way  a  Turing  machine  would, 
sweeping  back  and  forth  through  its  string.  In  fact,  it  is  often  easier  to  build  a  grammar 
to  compute  a  function  than  it  is  to  build  a  Turing  machine  because,  with  grammars,  we 
do  not  have  to  worry  about  shifting  the  siring  if  we  want  to  add  or  delete  characters 
from  somewhere  in  the  middle. 

Recall  that  in  Section  17.2.2,  we  defined  a  computable  function  to  be  a  function  that 
can  be  computed  by  a  Turing  machine  that  always  halts  and,  on  input  x ,  leaves /(.t)  on 
its  tape.  We  now  have  an  alternative  definition: 
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THEOREM  23.2  Turing  Machines  and  Unrestricted  Grammars 
Compute  the  Same  Class  of  Functions 

Theorem:  A  function  /  is  computable  iff  it  is  grammatically  computable.  In  other 
words  a  function /can  be  computed  by  a  Turing  machine  iff  there  is  an  unrestrict¬ 
ed  grammar  that  computes  it. 

Proof:  The  proof  requires  two  constructions.  The  first  shows  that,  for  any  grammar 
G,  there  is  a  Turing  machine  M  that  simulates  G,  halts  whenever  G  produces  a  ter¬ 
minal  string  5.  and  leaves  s  on  its  tape  when  it  halts. The  second  shows  the  other  di¬ 
rection:  For  any  Hiring  machine  M,  there  is  a  grammar  G  that  simulates  M  and 
produces  a  terminal  string  s  whenever  M  halts  with  s  on  its  tape.  These  construc¬ 
tions  are  similar  to  the  ones  we  used  to  prove  Theorem  23.1.  We  omit  the  details. 


23.4  Undecidable  Problems  About  Unrestricted  Grammars 

Consider  the  following  questions  that  we  might  want  to  ask  about  unrestricted 
grammars: 

•  Given  an  unrestricted  grammar  G  and  a  string  w,\s  w  e  L(G)? 

•  Given  an  unrestricted  grammar  G,  is  e  e  /.(G)? 

•  Given  two  unrestricted  grammars  G\  and  G2,  is  L(Gt)  =  L(G2)? 

•  Given  an  unrestricted  grammar  G,  is  L(G)  =  0? 

Does  there  exist  a  decision  procedure  to  answer  any  of  these  questions?  Or,  for¬ 
mulating  these  problems  as  language  recognition  tasks,  are  any  of  these  languages 
decidable? 

•  L„  =  { <G,  w>  :  G  is  an  unrestricted  grammar  and  w  e  L(G)} 

•  Lj,  =  { <G>  :  G  is  an  unrestricted  grammar  and  e  e  L(G)} 

•  Lc  =  {  <G|,  G2>  :  G,  and  G2  are  unrestricted  grammars  and  L(Gj)  =  L(G2)} 

•  Z.j  =  { <G>  :  G  is  an  unrestricted  grammar  and  L(G)  =  0} 

The  answer  to  all  these  questions  is.  “no”.  If  any  of  these  problems  involving  gram¬ 
mars  were  solvable,  then  the  corresponding  problem  involving  Hiring  machines  would 
also  be  solvable.  But  it  isn’t.  We  can  prove  each  of  these  cases  by  reduction.  We  will  do 
one  here  and  leave  the  others  as  exercises.  They  are  very  similar. 


THEOREM  23.3  Undecidability  of  Unrestricted  Grammars 


Theorem:  The  language  La  —  (<G,  mj>:G  is  an  unrestricted  grammar  and 
w  e  L{G) }  is  not  in  D. 
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Proof:  We  show  that  A  and  so  L.,  is  not  decidable.  Lei  R  be  a  mapping 

reduction  from  A  =  { <M.  iv>  :  Turing  machine  M  accepts  ic)  to  L#.  defined  as 
follows: 

R(<M.  w>)  = 

1.  From  M.  construct  the  description  <G#>  of  a  grammar  G#  such  that 
L(G#)  = 

2.  Return  <G#,  tt?>. 

If  Orach •  exists  and  decides  La,  then  C  =  Oraclc( R(  <  M.  iv>))  decides  A.  R 

can  be  implemented  as  a  Turing  machine  using  the  algorithm  presented  in 

Section  23.2.  And  C  is  correct: 

•  If  <M.  tc>  e  A :  M(ic)  halts  and  accepts.  ireL(Af).  So  we L(C#). 

Orude[<C #,  w>)  accepts. 

•  If  <M.  w>  e  A :  M(ic)  does  not  accept,  ice  l.(Xt).  So  u’gL(G#). 

Orade(<G#.  ic>)  rejects. 

But  no  machine  to  decide  A  can  exist,  so  neither  does  Oracle. 

So.  unrestricted  grammars,  although  powerful,  are  very  much  less  useful  than 
context-free  ones,  since  even  the  most  basic  question.  “Given  a  string  u\  does  G 
generate  u'T'  is  undecidable. 

23.5  The  Word  Problem  for  Semi-Thue  Systems 

Unrestricted  grammars  can  generate  languages  and  they  can  compute  functions.  A  third 
way  of  characterizing  their  computational  power  has  played  an  important  historical  role 
in  the  development  of  formal  language  theory.  Define  a  word  problem  to  be  the  following: 

Given  two  strings,  w  and  v.  and  a  rewrite  system  T.  determine  whether  a  can  be  de¬ 
rived  from  w  using  7’. 


An  important  application  of  the  word  problem  is  in  logical  reasoning.  If  we 
can  encode  logical  statements  as  strings  (in  the  obvious  way)  and  if  we  can 
also  define  a  rewrite  system  7'  that  corresponds  to  a  set  of  inference  rules, 
then  determining  whether  v  can  be  rewritten  as  w  using  /'  is  equivalent  to 
deciding  whether  the  sentence  that  corresponds  to  c  is  entailed  by  the  sen¬ 
tence  that  corresponds  to  ir. 


Any  rewrite  system  whose  job  is  to  transform  one  string  into  another  must  to  be 
able  to  start  with  an  arbitrary  string  (not  just  some  unique  start  symbol).  Further,  since 
both  the  starting  string  and  the  ending  one  may  contain  any  symbols  in  the  alphabet  of 
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the  system,  the  distinction  between  terminal  symbols  and  working  symbols  goes  away. 
Making  those  two  changes  to  the  unrestricted  grammar  model  we  get: 

A  semi-Thue  system  T  is  a  pair  (2.  R).  where: 

•  2  is  an  alphabet,  and 

•  R  (the  set  of  rules)  is  a  subset  of  x  2*. 

Semi-Thue  systems  were  named  after  their  inventor,  the  Norwegian  mathematician 
Axel  Thue.  Just  as  an  aside:  A  Thue  system  is  a  semi-Thue  system  with  the  additional 
property  that,  if  R  contains  the  rule  x  — *  y,  then  it  also  contains  the  rule  y  —*■  x. 

We  define  for  semi-Thue  systems  the  derives-in-one-step  relation  ( =>r)  and  its 
transitive  closure  derives  (  =>r *)  exactly  as  we  did  for  unrestricted  grammars.  Since 
there  is  no  distinguished  start  symbol,  it  doesn’t  make  sense  to  talk  about  the  lan¬ 
guage  that  can  be  derived  from  a  semi-Thue  system.  It  does,  however,  make  sense  to 
talk  about  the  word  problem:  Given  a  semi-Thue  system  T  and  two  strings  w  and  w, 
determine  whether  w  =>7*  v.  We  have  already  seen  that  it  is  undecidable,  for  an  unre¬ 
stricted  grammar  with  a  distinguished  start  symbol  S,  whether  S  derives  some  arbi¬ 
trary  string  w.  So  too  it  is  undecidable,  for  a  semi-Thue  system,  whether  one  arbitrary 
string  can  derive  another. 


THEOREM  23.4  Undecidability  of  the  Word  Problem  for  Semi-Thue  Systems 

Theorem:  The  word  problem  for  semi-Thue  systems  is  undecidable.  In  other  words, 
given  a  semi-Thue  system  T  and  two  strings  w  and  v,  it  is  undecidable  whether 
tv  =>/  *  v. 

Proof:  The  proof  is  by  reduction  from  the  halting  problem  language  H  =  { <  M ,  w> : 
TUring  machine  M  halts  on  input  string  w}.  Given  a  Turing  machine  M  and  an 
input  siring  «.*,  we  will  build  a  semi-Thue  system  T  with  the  property  that  M  halts 
on  w  iff  tv  =*7  *  S.The  construction  that  is  used  in  the  reduction  mirrors  the  con¬ 
struction  that  we  used  to  prove  (Theorem  23.1)  that  Turing  machines  and  unre¬ 
stricted  grammars  deline  the  same  class  of  languages.  Given  a  Turing  machine  M , 
we  lirst  build  a  semi-Thue  system  T  whose  rules  simulate  the  operation  of  M.  As¬ 
sume  that  the  symbol  S  has  not  been  used  in  the  construction  so  far.  We  now  add 
rules  to  7' as  follows: 

•  For  every  halting  stale  q  in  M ,  add  the  rule  q  -*  S. These  rules  guarantee  that, 
if  M  eventually  enters  some  halting  stale,  the  symbol  5  will  appear  in  a  string 
derived  by  T.  Since  S  is  an  otherwise  unused  symbol,  that  is  the  only  case  in 
which  it  will  appear  in  a  string  that  T  derives. 

•  For  every  other  symbol  c  in  Ts  alphabet,  add  the  rules  cS  -*•  S  and  Sc  -*  S. 
These  rules  enable  T  to  translorm  any  string  containing  the  symbol  S  into  a 
siring  that  is  just  S. 

So  if,  on  input  w.  M  would  ever  enter  a  halting  stale,  the  rules  of  T  will  enable 
tv  to  derive  some  string  that  contains  the  symbol  S.  That  string  will  then  derive 
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the  string  consisting  of  only  S.  And  M  entering  a  halting  state  is  the  only  way  to 
generate  an  S.  So  M  halts  on  input  w  iff  w  =>7*  S.Thus,  if  the  word  problem  were 
decidable,  H  would  also  be.  But  it  isn’t. 


Exercises 

1.  Show  an  unrestricted  grammar  that  generates  each  of  the  following  languages. 

a.  {fl2V":  n  a  0} 

b.  {a"bmc"+m:/j,  m  >  0} 

c*  {a"bmciu"  \n,m>0} 

d.  {a',b2',c3n:/i  2:  1} 

e.  {u>«;Rtu :  we  {a, b}*} 

f.  {a'Va'V :  n  s  0} 

g.  {jry#JfR:jr,ye{a. b}*and  U|  =  |y|} 

h.  {wcmd" :  w  e  {a,  b}*  and  m  =  #a(w)  and  n  =  #b(u')} 

2.  Show  a  grammar  that  computes  each  of  the  following  functions  (given  the  input 

convention  described  in  Section  23.3): 

a.  /:  {a.b}+-*{a,  b}\  where /(s  =  a,a2a3...  a^)  =  a3a3...awa,.  For  exam¬ 
ple  /(aabbaa)  =  abbaaa. 

b.  /:  {a,  b}+— ►  {a,  b,  1}+,  where  f(s)  =  si",  where  n  =  #,(s).  For  example 
/(aabbaa)  =  aabbaallll. 

c.  /:  {a,  b}*#{a,  b}*  — ►  {a,  b}*,  where  f(xiiy)  =  jryR. 

d.  /:  (a,b}+— *  {a, b}*,  where/(s)  =  if  #a(s)  is  even  then  s,  else  sR. 

e.  /:  {a,b}*  — ►  {a, b}*,  where  f(w)  =  vnv. 

f.  /:  (a.b}+— »  {a,b}*,  where /(.v)  =  if  |s|  is  even  then  s.  else  s  with  the  middle 
character  chopped  out.  (Hint:  The  answer  to  this  one  is  fairly  long,  but  it  is  not 
very  complex.  Think  about  how  you  would  use  a  Turing  machine  to  solve  this 
problem.) 

g.  f(n)  ~  m .  where  value |(n)  is  a  natural  number  and  valuex(m)  =  valuet 
( [  n/2  J ).  Recall  that  [  x  J  (read  as  “floor  of  x”)  is  the  largest  integer  that  is  less 
than  or  equal  to  or. 

h.  f(n)  =  m,  where  value2(n )  is  a  natural  number  and  value2(m)  = 
value2(n)  +  5. 

3.  Show  that,  if  G,  Gj  and  G2  are  unrestricted  grammars,  then  each  of  the  following 

languages,  defined  in  Section  23.4,  is  not  in  D: 

a.  Lb  =  (<G>  :ceL(G)} 

b.  Lc  =  {<G„G2>  :  L(G\ )  =  L(G2)} 

c.  Ld  =  {<G>  :  L(G )  =  0) 
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4.  Show  that,  if  G  is  an  unrestricted  grammar,  then  each  of  the  following  languages 
is  not  in  D. 

a.  {<C>:a*£L(G)} 

b.  {<G>  :  G  is  ambiguous}  {Hint.  Prove  this  by  reduction  from  PCP.) 

5.  Let  G  be  the  unrestricted  grammar  for  the  language  AnBnC"  =  {a^c" :  n  a  0}, 
shown  in  Example  23.1.  Consider  the  proof,  given  in  E.4,  of  the  undecidability  of 
the  Post  Correspondence  Problem.  The  proof  is  by  reduction  from  the  member¬ 
ship  problem  for  unrestricted  grammars. 

a.  Define  the  MPCP  instance  MP  that  will  be  produced,  given  the  input  <G, 
abc>,  by  the  reduction  that  is  defined  in  the  proof  of  Theorem  E.l. 

b.  Find  a  solution  for  MP. 

c.  Define  the  PCP  instance  P  that  will  be  built  from  MP  by  the  reduction  that  is 
defined  in  the  proof  of  Theorem  E.2. 

d.  Find  a  solution  for  P. 
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The  Chomsky  Hierarchy 
and  Beyond  * 

So  far.  we  have  .described  a  hierarchy  of  language  classes,  including  the  regular 
languages,  the  context-free  languages,  the  decidable  languages  (D),  and  the  semi- 
decidable  languages  (SD).The  smaller  classes  have  useful  properties,  including 
efficiency  and  decidability,  that  the  larger  classes  lack.  But  they  are  more  limited  in 
what  they  can  do.  In  particular,  PDAs  are  not  powerful  enough  for  most  applications. 
But.  to  do  better,  we  have  jumped  to  Turing  machines  and,  in  so  doing,  have  given  up 
the  ability  to  decide  even  the  most  straightforward  questions. 

The  question  naturally  arises,  “Are  there  other  formalisms  that  can  effectively 
describe  useful  languages?" The  answer  is  yes  and  we  will  consider  a  few  of  them  in 
this  chapter. 


24.1  The  Context-Sensitive  Languages 

We  would  like  a  computational  formalism  that  accepts  exactly  the  set  D.  We  have  one: 
The  set  of  Turing  machines  that  always  hall.  But  that  set  is  itself  undecidable.  What  we 
would  like  is  a  computational  model  that  comes  close  to  describing  exactly  the  class  D 
but  that  is  itself  decidable  in  the  sense  that  we  can  look  at  a  program  and  tell  whether 
or  not  it  is  an  instance  of  our  model. 

In  this  section  we'll  describe  the  context-sensitive  languages .  which  fit  into  our  ex¬ 
isting  language  hierarchy  as  shown  in  Figure  24.1.  The  context-sensitive  languages  can 
be  decided  by  a  class  of  automata  called  linear  bounded  automata.  They  can  also  be 
described  by  grammars  we  will  call  context-sensitive  (because  they  allow  multiple 
symbols  on  the  left-hand  sides  of  rules).  The  good  news  about  the  context-sensitive 
languages  is  that  many  interesting  languages  that  are  not  context-free  are  context- 
sensitive.  But  the  bad  news  is  that,  while  a  parsing  algorithm  for  them  does  exist,  no 
efficient  one  is  known  nor  is  one  likely  to  be  discovered. 


526 


24.1  The  Context-Sensitive  Languages 


527 


FIGURE  24.1  A  hierarchy  of  language  classes. 


.1.1  Linear  Bounded  Automata 

There  are  two  common  definitions  of  a  linear  bounded  automaton  (or  LBA).  The 
crucial  aspect  of  both  is  that  an  LBA  is  a  Hiring  machine  whose  tape  is  limited  by  the 
length  of  its  input.  The  two  definitions  can  be  stated  informally  as: 

1.  An  LBA  is  a  Hiring  machine  that  cannot  move  past  the  blank  square  on  either 
side  of  its  input. 

2.  An  LBA  is  a  Hiring  machine  that  cannot  use  more  than  k’\w\  tape  squares, 
where  w  is  its  input  and  k  is  some  fixed  positive  integer. 

The  second  definition  seems,  at  first  glance,  to  be  less  restrictive,  since  it  allows 
for  additional  working  space  on  the  tape.  But,  in  fact,  the  two  definitions  are  equiv¬ 
alent,  since  we  can  implement  a  definition  LBA  as  a  definition  LBA  whose  tape  is 
divided  into  tracks  to  simulate  k  tapes.  So  it  is  just  a  tradeoff  between  more  tape 
squares  and  a  larger  tape  alphabet.  Because  the  first  definition  is  slightly  simpler, 
we  will  use  it. 

A  linear  bounded  automaton  (or  LBA)  B  =  ( K ,  2,  T,  A,  s,  H )  is  a  nondeter- 
ministic  Turing  machine  that  cannot  move  off  the  tape  region  that  starts  at  the 
blank  to  the  left  of  the  input  and  ends  at  the  blank  immediately  after  the  input.  If 
an  LBA  attempts  to  move  off  that  region,  the  read/write  head  simply  stays  where  it 
is.  This  definition  is  slightly  nonstandard.  The  usual  one  restricts  the  head  to  the 
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input  siring  proper,  but  the  version  we  give  here  lets  us  maintain  the  programming 
style  that  we  have  been  using,  in  which  the  read/wrile  head  starts  on  the  blank  just 
to  the  left  of  the  input  and  we  delect  the  end  of  the  input  when  we  find  the  blank 
immediately  to  its  right. 

A  language  L  is  context-sensitive  iff  there  exists  an  LBA  that  accepts  it. 

Almost  all  of  the  deterministic  deciding  Turing  machines  that  we  have  described  so 
far  have  been  LBAs.  For  example,  the  machines  we  built  in  Chapter  17  for  AnBnCn  and 
WcW  are  both  LBAs.  So  AnBnC"  and  WcW  are  context-sensitive  languages. 

And  now  to  the  reason  that  it  made  sense  to  define  the  LBA:  The  halting  prob¬ 
lem  for  LBAs  is  decidable  and  thus  the  membership  question  for  context-sensitive 
languages  is  decidable. 


THEOREM  24.1  Decidability  of  LBAs 

Theorem:  The  language  L  =  { <B,  w>  :  LBA  B  accepts  tv\  is  in  D. 

Proof:  Although  L  looks  very  much  like  A.  the  acceptance  language  for  Turing  ma¬ 
chines.  its  one  difference,  namely  that  it  asks  about  an  LBA  rather  than  an  arbi¬ 
trary  Turing  machine,  is  critical.  We  observe  the  following  property  of  an  LBA  B 
operating  on  some  input  w:  B  can  be  in  any  one  of  its  |  K  |  states.  The  tape  that  B 
can  look  at  has  exactly  |«’l  +  2  squares.  Each  of  those  squares  can  contain  any 
value  in  F  and  the  read/wrile  head  can  be  on  any  one  of  them.  So  the  number  of 
distinct  configurations  of  B  is: 

MaxConfigs  —  |  AC|  -  •  (I  m*|  +  2). 

If  B  ever  reaches  a  configuration  that  it  has  been  in  before,  it  will  do  the  same 
thing  the  second  lime  that  it  did  the  first  time.  So.  if  it  runs  for  more  than 
MaxConfigs  steps,  it  is  in  a  loop  and  it  is  not  going  to  halt. 

We  are  now  ready  to  define  a  nondeterministic  luring  machine  that  decides  L : 

M(<B ,  M'>)  = 

1.  Simulate  all  paths  of  B  on  tv,  running  each  lor  MaxConfigs  steps  or  until 
B  halts,  whichever  comes  first. 

2.  If  any  path  accepted,  accept:  else  reject. 

Since,  from  each  configuration  of  B.  there  is  a  finite  number  of  branches  and 
each  branch  is  of  finite  length.  M  will  be  able  to  try  all  branches  of  B  in  a  finite 
number  of  steps.  M  will  accept  the  string  <B.  iv>  if  any  path  of  B.  running  on  w. 
accepts  (i.e.,  B  itself  would  accept)  and  it  will  reject  the  string  <B,  w>  if  every 
path  of  B  on  U'  either  rejects  or  loops. 

We  defined  an  LBA  to  be  a  nondeterministic  Turing  machine  with  bounded  tape. 
Does  nondeterminism  matter  for  LBAs?  Put  another  way,  for  any  nondeterministic 
LBA  B  does  there  exist  an  equivalent  deterministic  LBA?  No  one  knows. 
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24.1.2  Context-Sensitive  Grammars 

Why  have  we  chosen  to  call  the  class  of  languages  that  can  be  accepted  by  an  LBA  “context- 
sensitive'"?  Because  there  exists  a  grammar  formalism  that  exactly  describes  these  lan¬ 
guages  and  this  formalism,  like  the  unrestricted  grammar  formalism  on  which  it  is  based, 
allows  rules  whose  left-hand  sides  describe  the  context  in  which  the  rules  may  be  applied. 

A  context-sensitive  grammar  G  =  (V,  2,  R,  S)  is  an  unrestricted  grammar  in  which 
R  satisfies  the  following  constraints: 

•  The  left-hand  side  of  every  rule  contains  at  least  one  nonterminal  symbol. 

•  If  R  contains  the  rule  S  —*•  e  then  5  does  not  occur  on  the  right-hand  side  of  any  rule. 

•  With  the  exception  of  the  rule  S—*e ,  if  it  exists,  every  rule  a—*  (3  in  R  has  the 
property  that  |a|  ^  |/3|.  In  other  words,  with  the  exception  of  the  rule  S—>e, 
there  are  no  length-reducing  rules  in  R. 

We  should  point  out  here  that  this  definition  is  a  bit  nonstandard. The  more  common 
definition  allows  no  length-reducing  rules  at  all.  But  without  the  exception  for  S  — *  e,  it 
is  not  possible  to  generate  any  language  that  contains  e.  So  the  class  of  languages  that 
could  be  generated  would  not  include  any  of  the  classes  that  we  have  so  far  considered. 

We  define  =>  ( ileri ves-in -one-step ),  =**  {derives),  and  L(G)  analogously  to  the  way 
they  were  defined  for  context-free  and  unrestricted  grammars. 

Some  of  the  grammars  (both  context-free  and  unrestricted)  that  we  have  written  so 
far  are  context-sensitive.  But  many  are  not. 


EXAMPLE  24.1  A"B" 

For  the  language  AnB",  we  wrote  the  grammar: 

S  —  aSb 
S  —*B 

That  grammar  is  not  context-sensitive.  But  the  following  equivalent  grammar  is: 

S j  — *  aS[b 
5)  -*ab 


Because  of  the  prohibition  against  length-reducing  rules,  the  problem  of  determining 
whether  a  context-sensitive  grammar  G  generates  some  string  w  is  decidable.  Recall 
that  it  was  not  decidable  for  unrestricted  grammars,  so  this  is  a  significant  change. 
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THEOREM  24.2  Decidability  of  Context-Sensitive  Grammars  _ 

Theorem:  The  language  L  =  {<G,  i<» :  context  sensitive  grammar  G  generates 
string  is  in  D. 

Proof:  We  construct  a  nondeterministic  Turing  machine  M  to  decide  L.  M  will  ex¬ 
plore  all  derivations  that  G  can  produce  starting  from  its  start  symbol.  Eventually 
one  of  the  following  things  must  happen  on  every  derivation  path: 

•  G  will  generate  w. 

•  G  will  generate  a  string  to  which  no  rules  can  be  applied. The  path  will  end. 

•  G  will  keep  generating  strings  of  the  same  length.  Since  there  is  a  finite  num¬ 
ber  of  strings  of  a  given  length.  G  must  eventually  generate  the  same  one 
twice.  Whenever  that  happens,  the  path  can  be  terminated  since  it  is  not  get¬ 
ting  any  closer  to  generating  w. 

•  G  will  generate  a  siring  s  that  is  longer  than  The  path  can  be  terminated. 
Since  there  are  no  length-reducing  rules,  there  is  no  way  that  ir  could  ever  be 
derived  from  s.  It  is  this  case  that  distinguishes  context-sensitive  grammars 
from  unrestricted  ones. 

Since  G  has  only  a  finite  number  of  choices  at  each  derivation  step  and  since 
each  path  that  is  generated  must  eventually  end.  the  Turing  machine  M  that  ex¬ 
plores  all  derivation  paths  will  eventually  halt.  If  at  least  one  path  generates  w,  M 
will  accept.  If  no  path  generates  w .  M  will  reject. 


24.1.3  Equivalence  of  Linear  Bounded  Automata 
and  Context-Sensitive  Grammars 

We  now  have  a  new  computational  model,  the  LBA.  and  we  have  shown  that  it  is  decidable 
whether  some  LBA  B  accepts  a  string  tr.  We  also  have  a  new  grammatical  framework,  con¬ 
text-sensitive  grammars,  and  we  have  shown  that  it  is  decidable  whether  or  not  a  context- 
sensitive  grammar  G  generates  some  string  u’.That  similarity,  along  with  the  terminology 
that  we  have  been  using,  should  cause  the  following  theorem  to  come  as  no  surprise: 

THEOREM  24.3  Equivalence  of  LBAs  and  Context-Sensitive  Grammars 

Theorem:  The  class  of  languages  that  can  be  described  with  a  context-sensitive 
grammar  is  exactly  the  same  as  the  class  of  languages  that  can  be  accepted  by 
some  LBA.  Alternatively,  a  language  is  context-sensitive  iff  it  can  be  generated  by 
some  context-sensitive  grammar. 

Proof:  The  proof  Is  very  similar  to  the  one  that  we  did  of  Theorem  23.1 ,  which  asserted 
the  equivalence  of  unrestricted  grammars  and  Turing  machines.  We  must  do  two 
proofs: 

•  We  show  that,  given  a  context-sensitive  grammar  G.  there  exists  an  LBA  B 
such  that  L{G)  =  /-(fl).  We  do  this  by  construction  of  B  front  G.  B  uses  a 
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two-track  tape  (simulated  on  one  tape).  On  input  w ,  B  keeps  w  on  the  first 
track.  On  the  second  track,  it  nondeterministically  constructs  a  derivation 
using  G,  but  with  one  exception.  Any  path  that  is  about  to  generate  a  string 
that  is  longer  than  w  will  halt  immediately.  So  B  never  needs  a  tape  longer 
than  | id |  and  is  thus  an  LBA. 

•  We  show  that,  given  an  LBA  B,  there  exists  a  context-sensitive  grammar  G  such 
that  L(  B)  =  L(G).  As  in  the  proof  of  Theorem  23.1,  G  will  simulate  the  opera¬ 
tion  of  fl.The  design  of  G  is  a  bit  more  complex  now  because  it  cannot  use  work¬ 
ing  symbols  that  get  erased  at  the  end.  However,  that  problem  can  be  solved 
with  an  appropriate  encoding  of  the  nonterminal  symbols. 


24.1.4  Where  Do  Context-Sensitive  Languages 
Fit  in  the  Language  Hierarchy? 

Our  motivation  in  designing  the  LBA  was  to  get  the  best  of  both  worlds— something 
closer  to  the  power  of  a  Turing  machine,  but  with  the  decidability  properties  of  a  PDA. 
Have  we  succeeded?  Both  of  the  languages  {<fl,  w>  :  LBA  B  accepts  and 
{<G,  W>  :  context  sensitive  grammar  G  generates  string  «>}  are  decidable.  And  we 
have  seen  at  least  one  example,  AnB"Cn.  of  a  language  that  is  not  context-free  but  is 
context-sensitive.  In  this  section,  we  state  and  prove  two  theorems  that  show  that  the 
picture  at  the  beginning  of  this  chapter  is  correctly  drawn. 

THEOREM  24.4  The  Context-Sensitive  Languages  are  a  Proper  Subset  of  D 

Theorem:  The  context-sensitive  languages  are  a  proper  subset  of  D. 

Proof:  We  divide  the  proof  into  two  parts.  We  first  show  that  every  context-sensitive 
language  is  in  D.Then  we  show  that  there  exists  at  least  one  language  that  is  in  D 
but  that  is  not  context-sensitive. 

The  first  part  is  easy.  Every  context-sensitive  language  L  is  accepted  by  some  LBA 
B.  So  the  Turing  machine  that  simulates  B  as  described  in  the  proof  of  Theorem  24.1 
decides  L. 

Second,  we  must  prove  that  there  exists  at  least  one  language  that  is  in  D  but 
that  is  not  context-sensitive.  It  is  not  easy  to  do  this  by  actually  exhibiting  such  a 
language.  But  we  can  use  diagonalizaiion  to  show  that  one  exists. 

We  consider  only  languages  with  21  =  { a.  b}.  First  we  must  define  an  enumer¬ 
ation  of  all  the  context-sensitive  grammars  with  2  =  {a.  b}.  To  do  that,  we  need 
an  encoding  of  them.  Wc  can  use  a  technique  very  much  like  the  one  we  used  to 
encode  Turing  machines.  Specifically,  we  will  encode  a  grammar  G  =  (K,  2,  R,  5) 
as  follows: 

•  Encode  the  nonterminal  alphabet:  Let  k  =  \V  —  2 1  be  the  number  of  nontermi¬ 
nal  symbols  in  G.  Let  n  be  the  number  of  binary  digits  required  to  represent  the 
integers  0  to  k  —  1.  Encode  the  set  of  nonterminal  symbols  (V  —  2)  as 
Xf/p/j  •  •  •  (in,  where  each  d,  e  (  0, 1 ) .  Let  xO  0 . . .  0„  correspond  to  S. 
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•  Encode  the  terminal  alphabet.  { a.  b } ,  as  a  and  b. 

•  Encode  each  rule  a  — *  j8  in  R  as:  A  — ' ►  B .  where  A  is  the  concatenation  of  the 
encodings  of  all  of  the  symbols  of  a  und  B  is  the  concatenation  of  the  encod¬ 
ings  of  all  of  the  symbols  of  p.  So  the  encoding  of  a  rule  might  look  like: 

axOlb— »bx01a 

•  Finally,  encode  G  by  concatenating  together  its  rules,  separated  by  :’s.  So  a 
complete  grammar  G  might  be  encoded  as: 

xOO  — * axOOa ;  xOO  — » xOl ;  xOl  -*  bxOlb ;  xOl  — *  b 

Let  Enum(;  be  the  lexicographic  enumeration  of  all  encodings,  as  just  de¬ 
scribed,  of  context-sensitive  grammars  w  ith  1  =  { a.  b }.  Let  Enunilt b  be  the  lexi¬ 
cographic  enumeration  of  {a.  b  We  can  now  imagine  the  infinite  table  shown  in 
Table  24.1.  Column  0  contains  the  elements  of  Erwni<;.  Row  (I  contains  the  ele¬ 
ments  of  £/w/na,b.  Each  other  cell,  with  index  (i.j)  is  1  if  grammar,  generates 
string,  and  l)  otherwise.  Because  { <G,  w>  :  context  sensitive  grammar  G  gener¬ 
ates  string  it?}  is  in  D.  there  exists  a  Turing  machine  that  can  compute  the  values  in 
this  table  as  they  are  needed. 

Now  define  the  language  Ln  =  {string, :  siring,  e  /-(G,)}.  I.n  is: 

•  In  D  because  it  is  decided  by  the  following  Turing  machine  A/: 

Mix)  = 

1.  Find  x  in  the  list  £>ut/na>b.  Let  its  index  he  /.  (ln  other  words,  column  i 
corresponds  to  x.) 

2.  Lookup  cell  (/,/)  in  the  table. 

3.  If  the  value  is  0.  then  x  is  not  in  L{G,)  so  x  is  in  L,h  so  accept. 

4.  If  the  value  is  1,  then  x  is  in  L(G,)  so  x  is  not  in  so  reject. 

•  Not  context-sensitive  because  it  differs,  in  the  case  of  at  least  one  string,  from 
every  language  in  the  table  and  so  is  not  generated  by  any  context-sensitive 
grammar. 


Table  24.1  Using  diagonalization  to  show  that  there  exist  decidable  languages 
that  are  not  context-sensitive. 

Siring, 

Siring2 

String j 

Stringy 

Strings 

Stringy 

••• 

Grammarx 

1 

0 

0 

0 

0 

0 

Grammari 

0 

1 

0 

(J 

0 

0 

••• 

Grammar j 

1 

1 

0 

0 

0 

0 

••• 

Grammar 4 

0 

0 

1 

U 

0 

0 

... 

Grammar s 

1 

0 

1 

0 

n 

(I 

... 

— 

— 

... 

... 

Ml 

24.1  The  Context-Sensitive  Languages  533 


THEOREM  24.5  The  Context-Free  Languages  are  a  Proper  Subset  of  the 
Context-Sensitive  Languages 

Theorem:  The  context-free  languages  are  a  proper  subset  of  the  context-sensitive 
languages. 

Proof:  We  know  one  language,  A"BnCn.  that  is  context-sensitive  but  not  context-free. 
So  it  remains  only  to  show  that  every  context-free  language  is  context-sensitive. 

If  L  is  a  context-free  language  then  there  exists  some  context-free  grammar 
G  =  ( V,  2,  R,  S)  that  generates  it.  Convert  G  to  Chomsky  normal  form,  producing 
G'.G'  generates  L  —  {e).G'  is  a  context-sensitive  grammar  because  it  has  no 
length-reducing  rules.  If  e  e  L,  then  create  in  G’  a  new  start  symbol  S'  (distinct  from 
any  other  symbols  already  in  G'),  and  add  the  rules  S'  —*e  and  S'  —*■  S.  G'  is  still  a 
context-sensitive  grammar  and  it  generates  L.  So  L  is  a  context-sensitive  language. 


24.1.5  Closure  Properties  of  the  Context-Sensitive  Languages 

The  context-sensitive  languages  exhibit  strong  closure  properties.  In  order  to  prove 
that,  it  is  useful  first  to  prove  a  normal  form  theorem  for  context-sensitive  grammars. 
We  will  do  that  here,  and  then  go  on  to  prove  a  set  of  closure  theorems. 

A  context-sensitive  grammar  G  =  (V\  2,  R,  S)  is  in  nonterminal  normal  form  iff  all 
rules  in  R  are  of  one  of  the  following  two  forms: 

•  a— *c,  where  a  is  an  element  of  (V  -  Standee  2,  or 

•  a—*  1 3,  where  both  or  and  /3  are  elements  of  (V  -  2)+. 

In  other  words,  the  set  of  nonterminals  includes  one  for  each  terminal  symbol  and  it 
is  the  job  of  that  nonterminal  simply  to  generate  its  associated  terminal  symbol.  G  does 
almost  of  its  work  manipulating  only  nonterminals.  At  the  end,  the  terminal  symbols  are 
generated.  Once  terminal  symbols  have  been  generated,  no  further  rules  can  apply  to 
them  since  no  rules  have  any  terminals  in  their  left-hand  sides. 


THEOREM  24.6  Nonterminal  Normal  Form  for  Context-Sensitive 
Grammars 


Theorem:  Given  a  context-sensitive  grammar  G,  there  exists  an  equivalent  nonter¬ 
minal  normal  form  grammar  G'  such  that  L(G')  =  L(G). 

Proof.  The  proof  is  by  construction.  From  G  we  create  G'  using  the  algorithm 
converttononierminal  defined  as  follows: 


converttononterminal{G :  context-sensitive  grammar)  = 

1.  Initially,  let  G'  =  G. 

2.  For  each  terminal  symbol  c  in  2,  create  a  new  nonterminal  svmbol  Tc  and 
add  to  R<r  the  rule  7\.  — *  c. 
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3.  Modify  each  of  the  original  rules  (not  including  the  ones  that  were  just  cre¬ 
ated)  so  that  every  occurrence  of  a  terminal  symbol  e  is  replaced  by  the 
nonterminal  symbol  7V 

4.  Return  G'. 

Note  that  no  length-reducing  rules  have  been  introduced,  so  if  G  is  a  context- 
sensitive  grammar,  so  is  G\ 

We  can  now  stale  a  set  of  closure  theorems.  Hie  proofs  of  two  of  these  theorems  will 
exploit  nonterminal  normal  form  as  just  defined. 


THEOREM  24.7  Closure  Under  Union  _ 

Theorem:  The  context-sensitive  languages  are  closed  under  union. 

Proof:  The  proof  is  by  construction  of  a  context-sensitive  grammar.  The  construc¬ 
tion  is  identical  to  the  one  we  gave  in  the  proof  of  Theorem  13.5  that  the  context- 
free  languages  are  closed  under  union:  If  /.,  and  /.:  are  context-sensitive 
languages,  then  there  exist  context-sensitive  grammars  Gj  =  (V'),  ft,,  S,)  and 
G2  =  (V2.  -2*  ft:.  •$:)  such  'I1111  M  =  MG,)  and  L2  -  L(G2).  If  necessary,  re¬ 
name  the  nonterminals  of  G|  and  G2  so  that  the  two  sets  are  disjoint  and  so  that 
neither  includes  the  symbol  S.  We  will  build  a  new  grammar  G  such  that 
L(G)  =  L(G,)  U  L(G2).  G  will  contain  all  the  rules  of  both  G,  and  G2.  We  add  to 
G  a  new  start  symbol.  S,  and  two  new  rules,  S  —*  .S,  and  S  -*S2.  The  two  new 
rules  allow  G  to  generate  a  string  iff  at  least  one  of  G,  or  G-,  generates  it.  So 
G  =  (l^i  U  V^U  {S}.  2,  U  /?,  U  7GU  {5— *A’2).5)".  Note  that  no 
length-reducing  rules  are  introduced,  so  the  grammar  that  results  is  a  context- 
sensitive  grammar. 


THEOREM  24.8  Closure  Under  Concatenation 

Theorem:  The  context-sensitive  languages  are  closed  under  concatenation. 

Proof:  The  proof  is  by  construction  of  a  context-sensitive  grammar.  Again  we  use  the 
construction  of  Theorem  13.5:  If  L\  and  1.2  are  context-sensitive  languages,  then 
there  exist  context-sensitive  grammars  G,  =  (V',.  1,.  ft,.. S',)  and  G\  -  (W. 
fti.  ,S\)  such  that  L\  «  7.(G|)and/.2  =  /.(Gi).  If  necessary. rename  the  nontermi¬ 
nals  of  G,  and  G2  so  that  the  two  sets  are  disjoint  and  so  that  neither  includes  the 
symbol  S.  We  will  build  a  new  grammar  G  such  that  /.(G)  =  /.(G,)/.(G,).  G  will 
contain  all  the  rules  of  both  G ,  and  G2.  We  add  to  G  a  new  start  symbol.  S,  and  one 
new  rule.S  — *5|S2.  SoG  =  (V,  U  V2  U  {.S'}.  U  i**  fti  U  l<2 U  {.V  — *  S). 

However,  now  there  is  one  problem  that  we  need  to  solve:  Suppose  that  one  of 
the  original  grammars  contained  a  rule  with  /la  as  its  left  hand  side.  Figure  24.2 
shows  a  partial  parse  tree  that  might  be  generated  by  the  new  grammar.  The 
problem  is  that  Aa  can  match  at  the  boundary  between  the  substring  that  was 
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S 

Si  S, 

A  A 

aa<! - ?aa  FIGURE  24.2  Two  subtrees  may  imeracl. 

generated  from  S|  and  the  one  that  was  generated  from  S2.That  could  result  in  a 
string  that  is  not  the  concatenation  of  a  string  in  L\  with  a  string  in  L2.  If  only 
nonterminal  symbols  could  occur  on  the  left-hand  side  of  a  rule,  this  problem 
would  be  solved  by  the  renaming  step  that  guarantees  that  the  two  sets  of  nonter¬ 
minals  (in  the  two  original  grammars)  are  disjoint.  If  G  were  in  nonterminal  nor¬ 
mal  form,  then  that  condition  would  be  met. 

So.  to  build  a  grammar  G  such  that  L(G)  =  L(Gi)L(G2),  we  do  the  following: 

1.  Convert  both  G\  and  G2  to  nonterminal  normal  form. 

2.  If  necessary,  rename  the  nonterminals  of  G\  and  G2  so  that  the  two  sets  are 
disjoint  and  so  that  neither  includes  the  symbol  S. 

3.  C  =  (K,  U  VSU  {5},  S,  U  S2,  f?,  U/?2U  {S— *S(S2},  S). 


THEOREM  24.9  Closure  Under  Kleene  Star 


Theorem:  The  context-sensitive  languages  are  closed  under  Kleene  star. 

Proof:  The  proof  is  by  construction  of  a  context-sensitive  grammar.  If  L\  is  a 
context-sensitive  language,  then  there  exists  a  context-sensitive  grammar 
G |  =  (V|,  £j,7?|,S|)  such  that  L\  =  L{G\).  To  build  a  grammar  G  such  that 
L(G)  =  L(G |)*,  we  can  use  a  construction  similar  to  that  of  Theorem  13.5,  in 
which  we  create  a  new  start  symbol  and  let  it  generate  zero  or  more  copies  of 
the  original  start  symbol.  But  now  we  have  two  problems.  The  first  is  dealing 
with  e.  To  solve  it,  we’ll  introduce  two  new  symbols,  S  and  T ,  instead  of  one. 
We'll  add  to  the  original  grammar  G\  a  new  start  symbol  S ,  which  will  be  able 
to  be  rewritten  as  either  e  or  T.  T  will  then  be  recursive  and  will  be  able  to 
generate  l.(G ,)+.  If  necessary,  rename  the  nonterminals  of  G,  so  that  V\  does 
not  include  the  symbol  S  or  T.  G  will  contain  all  the  rules  of  G,.  Then  we  add 
to  G  a  new  start  symbol,  S,  another  new  nonterminal  T,  and  four  new  rules* 
S-*e.S-*7\r-*TS|,  and  r  —  S,. 

But  we  also  run  into  a  problem  like  the  one  we  just  solved  above  for  concate¬ 
nation.  Suppose  that  the  partial  tree  shown  in  Figure  24.3(a)  can  be  created  and 
that  there  is  a  rule  whose  left-hand  side  is  /M.Then  that  rule  could  be  applied 
not  just  to  a  string  that  was  generated  by  S|  but  to  a  string  at  the  boundary  be¬ 
tween  two  instances  of  Sj.To  solve  this  problem  we  can  again  convert  the  origi¬ 
nal  grammar  to  nonterminal  normal  form  before  we  start.  But  now  the  two 
symbols,  AA,  that  are  spuriously  adjacent  to  each  other  were  both  derived  from 
instances  of  the  same  nonterminal  (Si),  so  creating  disjoint  sets  of  nonterminals 
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FIGURE  24.3  Preventing  two  subtrees  from  interacting. 


won’t  solve  the  problem.  What  we  have  to  do  this  time  is  to  create  a  copy  of  the 
rules  that  can  be  used  to  derive  an  S|.  Let  those  rules  derive  some  new  nontermi¬ 
nal  S2.  The  two  sets  of  rules  will  do  exactly  the  same  thing  but  they  will  exploit 
disjoint  sets  of  nonterminals  as  they  do  so.  Then  we’ll  alternate  them.  So.  for  ex¬ 
ample.  to  generate  a  string  that  is  the  concatenation  of  four  strings  from  Lx,  we'll 
create  the  parse  tree  shown  in  Figure  24.3(b).  Now.  since  neither  5,  nor  Si  can 
generate  e.  it  can  never  happen  that  nonterminals  front  two  separate  subtrees 
rooted  by  Sy  can  be  adjacent  to  each  other,  nor  can  it  happen  front  two  separate 
subtrees  rooted  by  S2. 

We  can  now  slate  the  complete  construction  of  a  grammar  G  such  that 
L(C.)  =  L(G,)*: 

!.  Convert  G\  to  nonterminal  normal  form. 

2.  If  necessary,  rename  the  nonterminals  so  they  do  not  include  S.  T.T .  or  Si. 

3.  Create  a  new  nonterminal  S3  and  create  copies  (with  different  names)  of  all 
the  nonterminals  and  the  rules  in  G ,  so  that  l.{S2)  =  US{). 

4.  G  =  (V|  U  {S.T.  T'}  U  {S2  and  other  nonterminals  generated  in  step  3}, 

Si. 

«lu|s-£.s-rj-rS|./-slj'- rs,_.  r  -  s,) 

U  { the  rules  that  derive  Si.  as  generated  in  step  3 }, 

S). 
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THEOREM  24.10  Closure  Under  Intersection 


Theorem:  The  context-sensitive  languages  are  closed  under  intersection. 

Proof:  This  time  we  cannot  pattern  a  proof  after  one  we  did  for  the  context-free 
languages  since  the  context-free  languages  are  not  closed  under  intersection.  But 
we  can  do  a  proof  by  construction  of  an  LBA.  If  and  L2  are  context-sensitive 
languages,  then  there  exist  LBAs  B,  =  (/C 2,,  Tl,  A|,  S\,  H\)  and  B2  =  (X2,  S2, 
I\,  ,V|,  II |)  such  that  Lj  =  L(B ()  and  L2  =  L(B2).  We  construct  a  new  LBA 

B  such  that  L(B)  =  L(B{)  O  L(B2).  B  will  treat  its  tape  as  though  it  were  divided 
into  two  tracks.  It  will  first  copy  its  input  from  track  1  to  track  2. Then  it  will  simu¬ 
late  B\  on  track  1 .  If  that  simulation  accepts,  then  B  will  simulate  B2  on  track  2.  If 
that  simulation  also  accepts,  then  B  will  accept.  So  B  will  accept  iff  both  B\  and 
B2  do. 

THEOREM  24.11  Closure  Under  Complement 


Theorem:  The  context-sensitive  languages  are  closed  under  complement. 

Proof:  The  proof  of  this  claim  is  based  on  a  complexity  argument,  so  wc  will  delay  it 
until  Chapter  29,  but  see  Q. 

24.1.6  Decision  Procedures  for  the  Context-Sensitive  Languages 

We  have  already  shown  that  the  membership  question  for  context-sensitive  languages 
is  decidable.  Unfortunately,  it  does  not  appear  to  be  efficiently  decidable.  Comparing 
the  situation  of  context-free  languages  and  context-sensitive  languages,  we  have,  where 
w  is  a  siring  and  C  a  grammar: 

•  If  6'  is  a  context-free  grammar,  then  there  exists  a  CHn’’)  algorithm  (as  we  saw  in 
Chapter  15)  to  decide  whether  w  e  L(G). 

•  If  G  is  a  context-sensitive  grammar,  then  the  problem  of  deciding  whether  w  e  L(G) 
can  be  solved  by  the  algorithm  that  we  presented  in  the  proof  of  Theorem  24.2.  It  is  not 
certain  that  no  more  efficient  algorithm  exists,  but  it  is  known  that  the  decision  prob¬ 
lem  for  context-sensitive  languages  is  PS  PACE-complete.  (We'll  define  PSPACE-com- 
pleteness  in  Chapter  29.)  The  fact  that  the  problem  is  PSPACE-complete  means  that 
no  polynomial-algorithm  exists  for  it  unless  there  also  exist  polynomial-time  algo¬ 
rithms  foi  lai  ge  classes  of  other  problems  for  which  no  efficient  algorithm  has  yet  been 
found.  More  precisely,  no  polynomial-time  algorithm  for  deciding  membership  in  a 
context-sensitive  language  exists  unless  P  =  NP  =  PSPACE.  which  is  generally 
thought  to  be  very  unlikely. 


Because  no  efficient  parsing  techniques  for  the  context-sensitive  languages 
are  known,  practical  parsers  for  programming  languages  (G.4.2)  and  natural 
languages  (L.3.3)  typically  use  a  context-free  grammar  core  augmented  with 
specific  ot  ter  mechanisms.  They  do  not  rely  on  context-sensitive  grammars. 
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What  about  other  questions  we  might  wish  to  ask  about  context-sensitive  lan¬ 
guages?  We  list  some  questions  in  Table  24.2,  and  we  show  their  decidability  for  the 
context-sensitive  languages  and  also,  for  comparison,  for  the  context-free  languages. 

We  prove  two  of  these  claims  about  the  context-sensitive  languages  here  and  leave  the 
others  as  exercises.  Since  we  have  shown  that  context-sensitive  grammars  and  LBAs  de¬ 
scribe  the  same  class  of  languages,  any  question  that  is  undecidable  for  one  will  also  be 
undecidable  for  the  other.  So  we  can  prove  the  decidability  of  a  question  by  using  either 
grammars  or  machines,  whichever  is  more  straightforward.  We'll  do  one  example  of  each. 

THEOREM  24.12  "Is  a  Context-Sensitive  Language  Empty?"  is  Undecidable 

Theorem:  The  language  Li  —  { <B>  :  B  is  a  LBA  and  L(B)  =  0}  is  not  in  D. 

Proof:  The  proof  is  by  reduction  from  H^ANY  =  {<M>  :  there  does  not  exist  any 
string  on  which  Turing  machine  M  halts},  which  we  showed,  in  Theorem  21.15, is 
not  even  in  SD.  We  will  define  R ,  a  mapping  reduction  from  H.ANY  to  Li.  The 
idea  is  that  R  will  use  the  reduction  via  computation  history  technique  described 
in  Section  22.5.1.  Given  a  particular  Turing  machine  /W,  it  is  straightforward  to 
build  a  new  Turing  machine  A/#  that  can  determine  whether  a  siring  .v  is  a  valid 
compulation  history  of  M.  A/#  just  needs  to  check  four  things: 

•  The  string  x  must  be  a  syntactically  legal  computation  history. 

•  The  first  configuration  of  x  must  correspond  to  M  being  in  its  start  state,  with 
its  read/wrile  head  positioned  just  to  the  left  of  the  input. 

•  The  last  configuration  of  x  must  be  a  hailing  configuration. 

•  Each  configuration  after  the  first  must  be  derivable  from  the  previous  one 
according  to  the  rules  in  M's  transition  relation  8. 

In  order  to  check  these  things.  A/#  need  never  move  off  the  part  of  its  tape 
that  contains  its  input,  so  A/#  is  in  fact  an  LBA.  Since  a  computation  history  must 


Table  24.2  Decidability  of  questions  about  context-free  and  context-sensitive  languages. 

Decidable  for  context-free 

Decidable  for  context-sensitive 

languages? 

languages? 

Is  L  =  2*? 

No 

No 

Is  L\  -  Li? 

No  (but  Yes  for  deterministic  CFLs) 

No 

Is  L\  CLi? 

No 

No 

Is  L  regular? 

No 

No 

Is  -i  L  also 

No 

context-free? 

Is  -‘L  also 

Yes.  trh  iallv  since  the  context-sensitive 

context-sensitive? 

languages  are  closed  under  complement. 

IsL  =  0? 

Yes 

No 

IsL,nL2  =  0? 

No 

No 
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end  in  a  halting  slate,  there  will  be  no  valid  computation  histories  for  M  iff  M 
halls  on  nothing.  So  R  is  defined  as  follows: 

R(<M>)  = 

1.  Construct  the  description  <Af#>  of  an  LBA  Af#(.v)  that  operates  as 
follows: 

1.1.  If  x  is  a  valid  computation  history  of  M ,  accept,  else  reject. 

2.  Return  <M#>. 

If  Oracle  exists  and  decides  L2,  then  C  =  Oracle(R(<M>))  decides  H->any; 

•  II  can  be  implemented  as  a  Turing  machine. 

•  C  is  correct: 

•  If  <M>  e  H-,any:  There  are  no  valid  compulation  histories  of  the  Turing 
machine  M.so  the  LBA  M#  accepts  nothing.  Oracle(<M#>)  accepts. 

•  If  <M>  e  Hany:  There  is  at  least  one  valid  computation  history  of  M.  so 
I W#  accepts  at  least  one  string.  Orucle<Mti>  rejects. 

But  no  machine  to  decide  H-,Any  can  exist,  so  neither  does  Oracle. 

THEOREM  24.13  "Is  the  Intersection  of  Two  Context-Sensitive 

Languages  Empty?"  is  Undecidable _ 

Theorem:  The  language  Li  =  {<G|,  G2>  :G\  and  G2  are  context-sensitive  gram¬ 
mars  and  L(G|)  D  L(G2)  =  0}  is  not  in  D. 

Proof:  The  proof  is  by  reduction  from  L\  =  {<G,.  G2>  :G(  and  G2  are  context- 
free  grammars  and  L(G()  IT  L(G2)  =  0},  which  we  showed,  in  Theorem  22.9.  is 
not  in  D.  Let  R  be  a  mapping  reduction  from  L\  to  L2  defined  as  follows: 

W(<G|,  G2>)  — 

1.  Using  the  procedure  that  was  described  in  the  proof  of  ^ Theorem  24.5.  con¬ 
struct  from  the  two  context-free  grammars  G|  and  G2,  two  context-sensi¬ 
tive  grammars G3  and  G4such  that  L(G3)  =  L(G,)  and  L(G4)  =  L(G2)- 

2.  Return  <G3,  G4>. 

ir  Oracle  exists  and  decides  L2,  then  C  =  Oracle(R(<Gx,  G2>))  decides  L\. 
But  no  machine  to  decide  L,  can  exist,  so  neither  does  Oracle. 


2  The  Chomsky  Hierarchy 

In  1956.  Noam  Chomsky  described  a  slightly  different  version  of  the  onion  diagram 
that  we  have  been  using.  Chomsky's  version,  commonly  called  the  Chomsky  hierarchy, 
is  shown  In  Figure  24.4.  This  version  is  appealing  because,  for  each  level,  there  exists 
both  a  grammar  formalism  and  a  computational  structure.  Chomsky  used  the  terms 


540  Chapter  24  The  Chomsky  Hierarchy  and  Beyond 


FIGURE  24.4  The 

- Chomsky  hierarchy. 

type  O.type  l.typc  2. and  type  3  to  describe  the  four  levels  in  his  model  and  those  terms 
are  still  used  in  some  treatments  of  this  topic. 

The  basis  for  the  Chomsky  hierarchy  is  the  amount  and  organization  of  the  memory 
required  to  process  the  languages  at  each  level. 

•  type  0  (semidecidable):  no  memory  constraint 

•  type  1  (context-sensitive):  memory  limited  by  the  length  of  the  input  string 

•  type  2  (context-free ):  unlimited  memory  but  accessible  only  in  a  stack  (so  only  a  fi- 
nile  amount  is  accessible  at  any  point) 

•  type  3  (regular):  finite  memory 

The  Chomsky  hierarchy  makes  an  obvious  suggestion:  Different  grammar  for¬ 
malisms  offer  different  descriptive  power  and  may  be  appropriate  for  different  tasks. 
In  the  years  since  Chomsky  published  the  hierarchy,  that  idea,  coupled  with  the  need  to 
solve  real  problems,  has  led  to  the  development  of  many  other  formalisms.  We  will 
sketch  two  of  them  in  the  rest  of  this  chapter. 

24.3  Attribute,  Feature,  and  Unification  Grammars 

For  many  applications,  context-free  grammars  arc  almost,  but  not  quite,  good  enough 
While  they  may  do  a  good  job  of  describing  the  primary  structure  of  the  strings  in  a 
language,  they  make  it  difficult,  and  in  some  cases  impossible,  to  descrihe  constraints 
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on  Ihe  way  in  which  sibling  constituents  may  be  derived.  For  example,  we  saw  that  no 
context-free  grammar  exists  for  the  simple  artificial  language  AnBnCn  =  {a"b"c"  : 
n  s  ()}.  The  context-free  grammar  formalism  provides  no  way  to  express  the  con¬ 
straint  that  the  numbers  of  a's.  b's.and  c's  must  be  equal.  We’ve  seen  (in  Example  23.1) 
an  unrestricted  grammar  for  AuB"Cn.  But  we’ve  also  seen  that  unrestricted  grammars 
are  impractical. 

What  we  need  is  a  new  technique  for  describing  constraints  on  sets  of  constituents. 
The  approach  that  we  describe  next  treats  both  terminals  and  nonterminals  not  as 
atomic  symbols  but  rather  as  clusters  of  features  (or  attributes)  and  associated  values. 
Then  it  allows  rules  to: 

•  Define  ways  in  which  features  are  passed  up  and  down  in  parse  trees. 

•  Describe  constraints  on  feature  values  that  must  be  satisfied  before  the  rules  can 
be  applied. 


EXAMPLE  24.2  An  Attribute  Grammar  for  AnBnCn 

We’ll  show  an  attribute  grammar  G  for  the  language  ArBnCn.  G  will  be  a  context- 
free  grammar  that  has  been  augmented  with  one  feature,  size.  The  rules  in  G  will 
define  how  size  is  used.  Some  rules  will  compute  size  and  pass  it  up,  from  the  ter¬ 
minal  nodes,  to  the  root.  The  single  S  rule  will  contain  the  description  of  a  size 
constraint  that  must  be  satisfied  before  the  rule  can  be  applied. 

G  =  ({5,  A,  B,  C,  a, b, c},  {a, b,  c},  /?,  S),  where: 


{S->ABC 

(size(A)  =  size(B)  =  size(C)) 

A  — *  a 

(size(A)  *- 1) 

A  — *  A2  a 

(size{A)+-  size{A2)  +  1) 

B-*  b 

(size(B) «—  1) 

B  —  B2b 

(size(B)*—size(B2)  +  1) 

C  —  c 

(size(C) 1) 

C— *C2  c 

(size(C)  *—  size(C2)  +  1)}. 

In  this  example,  each  rule  has  been  annotated  with  an  attribute  expression. 
Read  the  notation  A2  as  the  name  for  the  daughter  constituent,  rooted  at  A,  cre¬ 
ated  by  the  rule  that  refers  to  it.  This  grammar  could  easily  be  used  by  a  bottom- 
up  parser  that  builds  the  maximal  A.  B.  and  C  constituents,  assigns  a  size  to  each, 
and  then  attempts  to  combine  the  three  of  them  into  a  single  S.The  combination 
will  succeed  only  if  all  the  sizes  match. 


The  fact  that  it  could  be  useful  to  augment  context-free  grammars  with  various  kinds 
oi  teulures  and  constiaints  has  been  observed  both  bv  the  writers  of  grammars  for  pro¬ 
gramming  languages  and  the  writers  of  grammars  of  natural  languages,  such  as  English. 
In  the  programming  languages  and  compilers  world,  these  grammars  tend  to  be  called 


542  Chapter  24  The  Chomsky  Hierarchy  and  Beyond 


attribute  grammars.  In  the  linguistics  world,  they  tend  to  he  culled  feature  grammars  or 
unification  grammars  (the  latter  because  of  their  reliance  on  a  matching  process, called 
unification,  that  decides  when  there  is  a  match  between  features  and  constraints). 

EXAMPLE  24.3  A  Unification  Grammar  Gets  Subject/Verb  Agreement 
Right 

In  Example  11.6.  we  presented  a  simple  fragment  of  an  English  grammar.  That 
fragment  is  clearly  incomplete;  it  fails  to  generate  most  of  the  sentences  of  Eng¬ 
lish.  But  it  also  overgenerates.  For  example,  it  can  generate  the  following  sentence 
(marked  with  an  *  to  show  that  it  is  ungrammatical): 

*  The  bear  like  chocolate. 

The  problem  is  that  this  sentence  was  generated  using  the  rule  5  — ►  NP  VP. 
Because  the  grammar  is  context-free,  the  NP  and  VP  constituents  must  be  real¬ 
ized  independently.  So  there  is  no  way  to  implement  the  English  constraint  that 
present  tense  verbs  must  agree  with  their  subjects  in  number  and  gender. 

We  can  solve  this  problem  by  replacing  the  simple  nonterminals  NP  (Noun 
Phrase)  and  VP  (Verb  Phrase)  by  compound  ones  that  include  features  corre¬ 
sponding  to  person  and  number.  One  common  way  to  do  that  is  to  represent 
everything,  including  the  primary  category,  as  a  feature.  So.  instead  oi  NP  and  VP, 
wc  might  have: 

CATEGORY  NP  [CATEGORY  VP 

PERSON  THIRD  PERSON  THIRD 

NUMBER  SINGULAR]  NUMBER  SINGULAR] 

Instead  of  atomic  terminal  symbols  like  bear,  wc  might  have: 

[CATEGORY  N 
LEX  bear 
PERSON  THIRD 
NUMBER  SINCULAR] 

Instead  of  grammar  rules  like  5  — *  NP  VP.  we  will  now  have  rules  that  are 
stated  in  terms  of  feature  sets.  The  idea  is  that  we  will  write  rules  that  describe 
constraints  on  how  features  must  match  in  order  for  constituents  to  be  combined. 
So.  for  example  th e  S-*  NP  VP  rule  might  become: 

[CATEGORY  S]  —  [CATEGORY  NP  [CATEGORY  VP 

NUMBER  xx  NUMBER  xx 

PERSON  x*]  PERSON  x2] 

This  rule  exploits  two  variables,  X\  and  jr>  to  describe  the  values  of  the 
NUMBER  and  PERSON  features.  Whenever  a  particular  NP  is  constructed,  it  will  (by 
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a  mechanism  lhal  we  won’t  go  into)  acquire  values  for  its  NUMBER  and  PERSON 
features  from  its  constituents  (usually  the  head  noun,  such  as  bear).  The  same 
thing  will  happen  for  each  individual  VP.  The  scope  of  the  variables  X|  and  x2 
should  be  taken  to  be  the  entire  rule,  which  will  thus  be  interpreted  to  say  that  an 
NP  and  a  VP  can  be  combined  to  form  an  S  iff  they  have  matching  values  for  their 
NUMBER  and  PERSON  features.  We’ve  oversimplified  here  by  suggesting  that  the 
only  way  for  values  to  match  is  for  them  to  be  identical.  Practical  systems  typically 
exploit  a  more  powerful  notion  of  matching.  For  example,  past  tense  verbs  in  Eng¬ 
lish  aren't  marked  for  number.  So  a  VP  that  dominated  the  verb  shot,  for  in¬ 
stance,  would  have  a  NUMBER  value  that  would  enable  it  to  combine  with  an  NP 
whose  NUMBER  was  either  SINGULAR  or  PLURAL. 


Several  important  natural  language  grammar  formalisms  are  feature  (unifi¬ 
cation  )-based  Q.  Grammars  written  in  those  formalisms  exploit  features 
that  describe  agreement  constraints  between  subjects  and  verbs,  between 
nouns  and  their  modifiers,  and  between  verbs  and  their  arguments,  to  name 
just  a  few.  (L.3.3)  They  may  also  use  semantic  features,  both  as  additional 
constraints  on  the  way  in  which  sentences  can  be  generated  and  as  the  basis 
for  assigning  meanings  to  sentences  once  they  have  been  parsed. 


Both  the  formal  power  and  the  computational  efficiency  of  attribute/feature/unifi¬ 
cation  grammars  depend  on  the  details  of  how  features  are  defined  and  used.  Not  all 
attribule/feature/unification  grammar  formalisms  are  stronger  than  context-free  gram¬ 
mars.  In  particular,  consider  a  formalism  that  requires  that  both  the  number  of  features 
and  the  number  of  values  for  each  feature  must  be  finite.  Then,  given  any  grammar  G 
in  that  formalism,  there  exists  an  equivalent  context-free  grammar  G'.  The  proof  of 
this  claim  is  straightforward  and  is  left  as  an  exercise.  With  this  restriction  then,  atlrib- 
ute/feature/unification  grammars  are  simply  notational  conveniences.  In  English,  there 
are  only  two  values  (singular  and  plural)  for  syntactic  number  and  only  three  values 
(first,  second  and  third)  lor  person.  So  the  grammar  that  we  showed  in  Example  24.3 
can  be  rewritten  as  a  (longer  and  more  complex)  context-free  grammar. 

Now  consider  the  grammar  lhal  we  showed  in  Example  24.2.  The  single  attribute 
size  can  take  an  arbitrary  integer  value.  We  know  that  no  context-free  equivalent  of 
that  grammar  exists.  When  the  number  of  attribute-value  pairs  is  not  finite,  the  power 
of  a  grammar  lormalism  depends  on  the  way  in  which  attributes  can  be  computed  and 
evaluated.  Some  formalisms  have  the  power  of  Turing  machines. 


Grammars,  augmented  with  attributes  and  constraints,  can  be  used  in  a  wide  va¬ 
riety  ol  applications.  For  example,  they  can  describe  component  libraries  and 
product  families  H . 
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Particularly  in  the  attribute  grammar  tradition,  it  is  common  to  divide  attributes 
into  two  classes: 

•  synthesized  attributes,  which  are  passed  up  the  parse  tree,  and 

•  inherited  attributes ,  which  are  passed  down  the  tree. 

Both  of  the  examples  that  we  have  presented  use  synthesized  attributes,  which  are 
particularly  well-suited  to  use  by  bottom-up  parsers.  Inherited  attributes,  on  the  other 
hand,  are  well-suited  to  use  by  top-down  parsers. 

One  appeal  of  attribute/feature/unification  grammars  is  that  features  can  be  used 
not  just  as  a  way  to  describe  constraints  on  the  strings  that  can  be  generated.  They  may 
also  be  used  as  a  way  to  construct  the  meanings  of  strings.  Assume  that  we  arc  working 
with  a  language  for  which  a  compositional  semantic  interpretation  function  (as  defined 
in  Section  2.2.6)  exists. Then  the  meaning  of  anything  other  than  a  primitive  structure 
is  a  function  of  the  meanings  of  its  constituent  structures.  So.  to  use  attributes  as  a  way 
to  compute  meanings  for  the  strings  in  a  language  L.  we  must: 

•  Create  a  set  of  attributes  whose  values  will  describe  the  meanings  of  the  primitives 
of  L.  For  English,  the  primitives  will  typically  be  words  (or  possibly  smaller  units, 
like  morphemes).  For  programming  languages,  the  primitives  will  be  variables,  con¬ 
stants,  and  the  other  primitive  language  constructs. 

•  Associate  with  each  grammar  rule  a  rule  that  describes  how  the  meaning  attributes 
of  each  element  of  the  rule's  right  hand  side  should  be  combined  to  form  the  mean¬ 
ing  of  the  left-hand  side.  For  example,  the  English  rule  S  — *  NR  VR  can  specify  that 
the  meaning  of  an  S  is  structure  whose  subject  is  the  meaning  of  the  constituent  NP 
and  whose  predicate  is  the  meaning  of  the  constituent  VR. 


Attribute  grammars  for  programming  languages  were  introduced  as  a  way  to 
define  the  semantics  of  programs  that  were  written  in  those  languages.  They 
can  be  a  useful  tool  for  parser  generators  Q.. 


24.4  Lindenmayer  Systems 

Lindenmayer  systems,  or  simply  L-syslems,  were  first  described  by  Aristid  Lindenmayer, 
a  biologist  whose  goal  was  to  model  plant  development  and  growth.  L-syslems  are  gram¬ 
mars.  They  use  rules  to  derive  strings.  But  there  are  three  differences  between  L-systems 
and  the  other  grammar  formalisms  we  have  discussed  so  far.  These  differences  arise 
from  the  fact  that  L-systems  were  designed  not  to  define  languages  but  rather  to  model 
ongoing,  dynamic  processes. 

The  first  difference  is  in  the  way  in  which  rules  are  applied.  In  all  of  our  other  gram¬ 
mar  formalisms. rules  are  applied  sequentially.  In  L-systems. as  in  the  Game  of  Life  and 
the  other  cellular  automata  that  we  mentioned  in  Chapter  1 S.  rules  are  applied,  in  par¬ 
allel.  to  all  the  symbols  in  the  working  string.  For  example,  think  of  each  W'orking  string 
as  representing  an  organism  at  some  time  /.  At  time  /  +  1,  each  of  its  cells  will  have 
changed  according  to  the  rules  of  cell  development.  Or  think  of  each  working  string  as 
representing  a  population  at  some  time  t.  At  lime  l  ■+  1.  each  of  the  individuals  will 
have  matured,  died,  or  reproduced  according  to  the  rules  of  population  change. 
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The  second  difference  is  in  what  it  means  to  generate  a  string.  In  all  our  other  gram¬ 
mar  formalisms,  derivation  continues  at  least  until  no  nonterminal  symbols  remain  in  the 
working  string.  Only  strings  that  contain  no  nonterminal  symbols  are  considered  to  have 
been  generated  by  the  grammar.  In  L-syslems,  because  we  are  modeling  a  process,  each 
of  the  working  strings  will  be  considered  to  have  been  generated  by  the  grammar.  The 
distinction  between  terminals  and  nonterminals  disappears,  although  there  may  be  some 
symbols  that  will  be  treated  as  constants  (i.e.,no  rules  apply  to  them). 

The  third  difference  is  that  we  will  start  with  an  initial  string  (of  one  or  more  sym¬ 
bols).  rather  than  just  an  initial  symbol. 

An  L-syslem  G  is  a  triple  (2).  R.  w).  where: 

•  2  is  an  alphabet,  which  may  contain  a  subset  C  of  constants,  to  which  no  rules  will 
apply, 

•  R  is  a  set  of  rules,  and 

•  o)  (the  start  sequence)  is  an  element  of  S  *. 

Each  rule  in  R  is  of  the  form:  aA(3—*y,  where: 

•  A  e  £.  A  is  the  symbol  that  is  to  be  rewritten  by  the  rule. 

•  a.  /3  e  £*.  ar  and  /3  describe  context  that  must  be  present  in  order  for  the  rule  to 
fire.  If  they  are  equal  to  e.  no  context  is  checked. 

•  ye  2*.  y  is  the  string  that  will  replace  A  when  the  rule  fires. 

The  most  straightforward  way  to  describe  L(G),  the  set  of  strings  generated  by  an 
L-syslem  G  is  to  specify  an  interpreter  for  G.  We  do  that  as  follows: 

L-system-inierpret(G :  L-system)  = 

1.  Set  working-siring  to  w. 

2.  Do  forever: 

2.1.  Output  working-string. 

2.2.  new-working-string  =  e. 

2.3.  For  each  symbol  c  in  working-string  (moving  left  to  right)  do: 

If  possible,  choose  a  rule  r  whose  left-hand  side  matches  c  and  where  c’s 
neighbors  (in  working-string)  satisfy  any  context  constraints  included  in  r. 

If  a  rule  r  was  found,  concatenate  its  right-hand  side  to  the  right  end 
of  new-working-string. 

If  none  was  found,  concatenate  c  to  the  right  end  of  new-working-string. 

2.4.  working-string  =  new- working-string. 


In  addition  to  their  original  purpose.  L-systems  have  been  used  for  applica¬ 
tions  ranging  front  composing  music  (N.l  .2)  to  predicting  protein  folding  5 
to  designing  buildings  W. 
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Because  each  successive  siring  is  huili  by  recursively  applying  the  rules  lo  the  symbols 
in  the  previous  string,  the  strings  that  L-svstems  generate  typically  exhibit  a  property 
called  self-similarity.  We  say  that  an  object  is  self-similar  whenever  the  structure  exhib¬ 
ited  by  its  constituent  parts  is  vety  similar  to  the  structure  of  the  object  taken  as  a  whole. 


EXAMPLE  24.4  Fibonacci's  Rabbits 
Let  G  be  the  L-syslem  defined  as  follows: 

2  =  {I.M}. 

a>=I. 

R  =  { I  —  M, 

M  ->  MI}. 

lT»c  sequence  of  strings  generated  by  G  begins: 

0.  I 

1.  M 

2.  MI 

3.  M  I M 

4.  M  I  M  M  I 

5.  MIMMIMIM 

6.  MIMMIMIMMIMMI 

If  we  describe  each  string  by  its  length,  then  the  sequence  that  G  generates  is 
known  as  the  Fibonacci  sequence  S,  defined  as: 

Fibonacci o  =  1. 

Fibonacci  \  =  1. 

For  n  >  1,  Fibonacci „  =  Fibonnuci,,,  j  +  Fihonnaci w_2. 

Fibonacci's  goal,  in  defining  the  sequence  that  bears  his  name,  was  to  model  the 
growth  of  an  idealized  rabbit  population  in  which  no  one  dies  and  each  mature  pair 
produces  a  new  male-female  pair  at  each  time  step.  Assume  that  it  takes  one  time 
step  for  each  rabbit  to  reach  maturity  and  male.  Also  assume  that  the  gestation  pe¬ 
riod  of  rabbits  is  one  time  step  and  that  we  begin  with  one  pair  of  (immature)  rab¬ 
bits.  So  at  time  step  0,  there  is  1  pair.  At  time  step  1  there  is  still  1  pair,  but  they  have 
matured  and  mated.  So,  at  time  step  2,  the  original  pair  is  alive  and  has  produced 
one  new  one.  At  time  step  3,  all  pairs  from  time  step  2  (of  which  there  are  2)  are  still 
alive  and  all  pairs  (of  which  there  is  just  1)  that  have  been  around  at  least  two  time 
steps  have  produced  a  new  pair.  So  there  are  2+1=3  pairs.  At  lime  step  4,  the  3 
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pairs  from  the  previous  step  are  still  alive  and  the  2  pairs  from  two  steps  ago  have 
reproduced,  so  there  are  3  +  2  =  5  pairs.  And  so  forth. 

& 

&  dz  dt 

dzdcddzdz 

&&&&&&&£& 

Notice  that  the  strings  that  G  produces  mirror  this  structure.  Each  I  corre¬ 
sponds  to  one  immature  pair  of  rabbits  and  each  M  corresponds  to  one  mature 
pair.  Each  string  is  the  concatenation  of  its  immediately  preceding  string  (the  sur¬ 
vivors)  with  the  string  that  preceded  it  two  steps  back  (the  breeders). 


Leonardo  Pisano  Fibonacci  lived  from  1 170  to  1250.  Much  more  recently,  the 
L-syslem  that  describes  the  sequence  that  bears  his  name  has  been  used  to 
model  things  as  various  as  plant  structure  (0.2.2).  limericks  S  and  ragtime 
music  P . 


L-systems  can  be  used  to  model  two  and  three-dimensional  structures  by  assigning 
appropriate  meanings  to  the  symbols  that  gel  generated.  For  example,  the  turtle 
geometry  system  P  provides  a  set  of  basic  drawing  primitives.  A  turtle  program  is 
simply  a  string  of  those  symbols.  So  we  can  use  L-systems  to  generate  turtle  programs 
and  thus  to  generate  two-dimensional  images. Three-dimension  structures  can  be  built 
in  a  similar  way. 

Fractals  R  are  self-similar,  recursive  structures,  so  they  are  easy  to  generate  using 
L-systems. 


EXAMPLE  24.5  Sierpinski  Triangle 

Let  G  be  the  L-systcm  defined  as  follows: 
2  =  (A,  B. 
cj  =  A. 

R  =  {A—*  B  —  A  —  B, 

B  — *  A  +  B  +  A}. 
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Notice  that  +  and  -  are  constants.  No  rules  transform  them  so  they  are  simply 
copied  to  each  successive  output  string.  The  sequence  of  strings  generated  by  G 
begins: 

1.  A 

2.  B  -  A  -  B 

3.  A+B  +  A-  B-  A-  B-  A  +  B  +  A 

4.  B-A-B  +  A  +  B  +  A  +  B-  A-  B-  A+  B  +  A-  B-  A-  B-  A+  B  +  A 

-B-A-B+A+B+A+B-A-B 

We  can  interpret  these  strings  as  turtle  programs  by  choosing  a  line  length  k 
and  then  attaching  meanings  to  the  symbols  in  2  as  follows: 

•  A  and  B  mean  move  forward,  drawing  a  line  of  length  k. 

•  +  means  turn  to  the  left  60°. 

•  —  means  turn  to  the  right  60°. 

Strings  3, 4, 8,  and  10  then  correspond  to  turtle  programs  that  can  draw  the  fol¬ 
lowing  sequence  of  figures  (scaling  k  appropriately): 


The  limit  of  this  sequence  (assuming  that  an  appropriate  scaling  factor  is  applied 
at  each  step)  is  the  fractal  known  as  the  Sierpinski  triangle  S. 
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The  growth  of  many  natural  structures  can  most  easily  be  described  as  the  develop¬ 
ment  of  branches,  which  split  into  new  branches,  which  split  into  new  branches,  and  so 
forth.  We  can  model  this  process  with  an  L-system  by  introducing  into  v  two  new  sym¬ 
bols:  (  will  correspond  to  a  push  operation  and  ]  will  correspond  to  a  pop.  If  we  are  in¬ 
terpreting  the  strings  the  L-system  generates  as  turtle  programs,  push  will  push  the 
current  pen  position  onto  a  stack.  Pop  will  pop  off  the  top  pen  position,  pick  up  the 
pen,  return  it  to  the  position  that  is  on  the  top  of  the  stack,  and  then  put  it  down  again. 

EXAMPLE  24.6  Trees 

Let  G  be  the  L-system  defined  as  follows: 

2  =  (F. 
to  =  F. 

K  =  {F-F[— F]F[+F][F]}. 

The  sequence  of  strings  generated  by  G  begins: 

L  F 

2.  F[-F]F[  +  F][F] 

3.  F[-F]F[  +  F][F](-F[-F]F[  +  F][F]]F[-F]F[  +  F][F][ 

+  F[-F]F[  +  F][F]][F[  — F]FI+F][F]] 

We  can  interpret  these  strings  as  turtle  programs  by  choosing  a  line  length  k 
and  then  attaching  meanings  to  the  symbols  in  2  as  follows: 

•  F  means  move  forward,  drawing  a  line  of  length  k. 

•  +  means  turn  to  the  left  36°. 

•  -  means  turn  to  the  right  36°. 

•  [  means  push  the  current  pen  position  and  direction  onto  the  stack. 

•  ]  means  pop  the  top  pen  position/direction  off  the  stack,  lift  up  the  pen,  move 
it  to  the  position  that  is  now  on  the  top  of  the  stack,  put  it  back  down,  and  set 
its  direction  to  the  one  on  the  top  of  the  stack. 

Strings  2, 3, 4,  and  8  then  correspond  to  turtle  programs  that  can  draw  the  fol¬ 
lowing  sequence  of  figures  (scaling  k  appropriately): 


One  note  about  these  pictures:  The  reason  that  the  number  of  line  segments  is 
not  consistently  a  power  of  5  is  that  some  lines  are  drawn  on  top  of  others. 
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Much  more  realistic  trees,  as  well  as  other  biological  structures,  can  also  be 
described  with  L-syslems.  (0-2.2) 


So  far.  all  of  the  L-systems  that  we  have  considered  are  context-free  (because  we 
have  put  no  context  requirements  on  the  left-hand  sides  of  any  of  the  rules)  and  deter¬ 
ministic  (because  there  is  no  more  than  one  rule  that  matches  any  symbol).  Determin¬ 
istic.  context-free  L-syslems  are  called  DOL-systems  and  are  widely  used.  But  we 
could,  for  example,  give  up  determinism  and  allow  competing  rules  with  the  same  left- 
hand  side.  In  that  ease,  one  common  way  to  resolve  the  competition  is  to  attach  proba¬ 
bilities  to  the  rules  and  thus  to  build  a  stochastic  L-system. 

We  can  also  build  L-systems  that  are  not  context-free.  In  such  systems,  the  left-hand 
side  of  a  rule  may  include  contextual  constraints.  These  constraints  will  be  checked  be¬ 
fore  a  rule  can  be  applied.  But  the  constraints  do  not  participate  in  the  substitution  that 
the  rule  performs.  Each  rule  still  describes  how  a  single  symbol  is  to  be  rewritten. 


EXAMPLE  24.7  Sierpinski  Triangle,  Again 

Imagine  a  one-dimensional  cellular  automaton.  Each  cell  may  contain  the  value 
black  or  white.  At  any  point  in  lime,  the  automaton  consists  of  a  finite  number  of 
cells,  although  it  may  grow  to  the  right  from  one  step  to  the  next.  We  will  display 
successive  time  steps  on  successive  lines,  with  each  cell  immediately  below  its  po¬ 
sition  at  the  previous  time  step.  With  this  arrangement,  define  the  parents  of  a  cell 
at  time  I  to  be  the  cell  immediately  above  it  (i.e..  itself  at  lime  /  -  I )  and  (if  it  ex¬ 
ists)  the  one  that  is  one  row  above  it  but  shifted  one  cell  to  the  left  (i.e..  its  left 
neighbor  at  lime  t  -  I ).  Now'  we  can  state  the  rule  for  moving  from  one  time  step 
to  the  next:  At  each  lime  step  after  the  first,  a  cell  will  exist  if  it  would  have  at  least 
one  parent.  And  its  value  will  be 

•  black  if  it  has  exactly  one  black  parent. 

•  white  otherwise. 

Note  that,  at  each  time  step,  one  additional  cell  to  the  right  will  have  a  parent 
(the  cell  above  and  to  its  left).  So  the  automaton  will  grow  to  the  right  by  one  black 
cell  at  each  time  step.  We  will  start  with  the  one-cell  automaton  ■.  After  32  steps, 
our  sequence  of  automata  will  draw  the  Sierpinski  triangle  shown  on  the  next  page. 

We  can  define  <7. a  context-sensitive  L-svslem  to  generate  this  sequence.  We'll 
use  the  following  notation  for  specifying  context  in  the  rules  of  O':  The  left-hand 

side  (a.  It . c)  m  (.v,  y _ 2)  will  match  the  symbol  in  iff  the  symbol  to  its  left  is 

any  of  the  symbols  in  the  list  (a.  h . e)  and  the  symbol  to  its  right  is  any  of  the 

symbols  in  the  list  (.r.  y _ z).  Hie  symbol  k  w  ill  match  iff  the  corresponding  con¬ 

text  is  empty.  (Note  that  this  differs  from  our  usual  interpretation  of  e.  in  which  it 
matches  everyw  here.) 


24.4  Lindenmayer  Systems  551 


Willi  those  conventions,  G  is: 

2  - 

u»  =  1. 

R  =  {(e|D)  ■  (£)—»■■, 

/*  This  square  is  black  with  no  black  one 
to  the  left,  so  at  t  +  1  there’s  exactly 
one  black  parent.  The  new  cell  is  black. 
And  there’s  no  cell  to  the  right,  so  add 
one,  which  also  has  one  black  parent  so 
it  too  is  black. 

(«)□)■(■!□)-♦  ■. 

/*  This  square  is  black  and  no  black  one  to 
the  left,  so  at  r  +  1  there’s  exactly  one 
black  parent.  The  new  cell  is  black. 

(■)■(£)—□■. 

/*  Black,  plus  black  to  the  left.  Two  black 
parents.  New  cell  is  white.  No  cell  to  the 
right,  so  add  one.  It  has  one  black  parent 
so  it  is  black. 

(■)■(■!□)-□, 

/*  Two  black  parents.  New  one  is  white. 

(e|D)  □  (e)  — 

/*Two  white  parents.  New  one  is  white. 
Add  cell  to  right. 

(*!□)□  (■!□>— a 

1*  Two  white  parents.  New  one  is  white. 
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(■)  □  (e)  — *  ■  ■.  /*  One  black  parent.  New  one  is  white.  Add 

cell  to  right. 

(■)  □  (■!□)  — *  ■}.  /*  One  black  parent.  New  one  is  black. 

C  generates  a  Sierpinski  triangle  point  wise,  while  the  L-system  we  described 
in  Example  24.5  generates  one  by  drawing  lines. 


Context-free  L-systcms  do  not  have  the  power  of  Turing  machines.  But.  if  context  is 
allowed.  L-systems  are  equivalent  in  power  to  Turing  machines.  So  we  can  state  the  fol¬ 
lowing  theorem: 


THEOREM  24.14  Context-Sensitive  L-Systems  are  Turing  Equivalent 

Theorem:  The  computation  of  any  context-sensitive  L-system  can  be  simulated  by 
some  Turing  machine.  And  the  compulation  of  any  Turing  machine  can  be  simu¬ 
lated  by  some  deterministic,  context-sensitive  L-syslom. 

Proof:  Tire  computation  of  any  L-system  can  be  simulated  by  a  Turing  machine 
that  implements  the  algorithm  L-system -interpret.  So  it  remains  to  show  the 
other  direction. 

The  proof  that  the  execution  of  any  Turing  machine  can  be  simulated  by  some 
deterministic,  context-sensitive  L-system  is  by  construction.  More  precisely,  we’ll 
show  that  Turing  machine  M,  on  input  w\  halts  in  some  halting  stale  q  and  with 
tape  contents  v  iff  L-syslem  L  converges  to  the  static  siring  qe. 

If  M  is  not  deterministic,  create  an  equivalent  deterministic  machine  and  pro¬ 
ceed  with  it.  Then,  given  M  and  tr.  define  L  as  follows: 

•  Let  2/  be  augmented  as  follows: 

•  Add  the  symbol  0  to  encode  M's  start  state. 

•  If  M  has  the  halting  state  y.  add  the  symbol  y  to  encode  it. 

•  If  M  has  the  halting  state  n.  add  the  symbol  n  to  encode  it. 

•  If  Af  has  any  other  halting  stales,  add  the  symbol  h  to  encode  all  of  them. 

•  Add  one  distinct  symbol  for  each  nonhalling  state  of  A/. 

•  Let  oj  (L's  start  string)  encode  M's  initial  configuration.  Configurations  will 
be  encoded  by  a  string  that  represents  M's  active  tape,  plus  two  blank  squares 
on  each  end. The  symbol  that  represents  M*s  current  slate  will  be  inserted  into 
the  siring  immediately  to  the  left  of  the  tape  symbol  that  is  under  the 
read/write  head.  We  will  follow  our  usual  convention  that,  just  before  it  starts, 
M’s  read/write  is  on  the  blank  square  just  to  the  left  of  the  first  input  charac¬ 
ter.  Sow  =  □□ODhOD. 

•  Let  the  rules  R  of  L  encode  M‘s  transitions. To  do  this,  we  exploit  the  fact  that 
the  action  of  a  Turing  machine  is  very  local.  Things  only  change  near  the 
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read/write  head.  So,  letting  integers  correspond  to  states,  suppose  that  the 
working  string  of  L  is  ga4bcde.  This  encodes  a  configuration  in  which  M’s 
read/wrile  head  is  on  the  b  and  M  is  in  state  4.  The  read/write  head  can  move 
one  square  to  the  left  or  one  square  to  the  right.  Whichever  way  it  moves,  the 
character  under  it  can  change.  So,  if  it  moves  left,  the  a  changes  to  some  state 
symbol,  the  4  changes  to  an  a,  and  the  b  changes  to  whatever  it  gets  rewritten 
as.  If.  on  the  other  hand,  the  read/write  head  moves  to  the  right,  the  4  changes 
to  whatever  the  b  gets  rewritten  as  and  the  b  gets  rewritten  as  the  new  state 
symbol.  To  decide  how  to  rewrite  some  character  in  the  working  string,  it  is 
sufficient  to  look  at  one  character  to  its  left  and  two  to  its  right  If  there  is  no 
stale  symbol  in  that  area,  the  symbol  gets  rewritten  as  itself.  No  rule  need  be 
specified  to  make  this  happen.  For  all  the  combinations  that  do  involve  a  state 
symbol,  we  add  to  R  rules  that  cause  the  system  to  behave  as  M  behaves.  Fi¬ 
nally,  add  rules  so  that,  if  h,  y,  or  n  is  ever  generated,  it  will  be  pushed  all  the 
way  to  the  left,  leaving  the  rest  of  the  string  unchanged.  Add  no  other  rules  to 
R  (and  in  particular  no  other  rules  involving  any  of  the  halting  state  symbols). 

L  will  converge  to  qv  iff  M  halts,  in  state  q ,  with  v  on  its  tape. 


Exercises 

1.  Write  context-sensitive  grammars  for  each  of  the  following  languages  L.  The 
challenge  is  that,  unlike  with  an  unrestricted  grammar,  it  is  not  possible  to  erase 
working  symbols. 

a.  A"BnCn  =  {  a"b"c" :  n  >  l)}. 

b.  WW  =  {irw :  we  {a,  b}*}. 

c.  {we  {a, b.  c}* :  #a(  w  )  =  #b(w  )  *  #cC«’  )V- 

2.  Prove  that  each  of  the  following  languages  is  context-sensitive. 

a.  {a"  :  n  is  prime} 

b.  {a"‘:  n  s  0} 

c.  { xw.xR :  x ,  we  {a. bp  and  |.t|  =  |ir| } 

3.  Prove  that  every  context-free  language  is  accepted  by  some  deterministic  LBA. 

4.  Recall  the  diagonalization  proof  that  we  used  in  the  proof  of  Theorem  24.4, 
which  tells  us  that  the  context-sensitive  languages  are  a  proper  subset  of  D.  Why 
cannot  that  same  proof  technique  be  used  to  show  that  there  exists  a  decidable 
language  that  is  not  decidable  or  an  SD  language  that  is  not  decidable? 

5.  Prove  that  the  context-sensitive  languages  are  closed  under  reverse. 

6.  Prove  that  each  of  the  following  questions  is  undecidable. 

a.  Given  a  context-sensitive  language  L,is  L  =  £*? 

b.  Given  a  context-sensitive  language  L ,  is  L  finite? 

c.  Given  two  context-sensitive  languages  L,  and  L •>,  is  L\  =  Li? 

d.  Given  two  context-sensitive  languages  L,  and  is  Lx  C  L2? 

e.  Given  a  context-sensitive  language  Z.,is  L  regular?  ~ 
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7.  Prove  the  following  claim,  made  in  Section  24.3:  Given  an  attribute/feature/unifi¬ 
cation  grammar  formalism  that  requires  that  both  the  number  of  features  and  the 
number  of  values  for  each  feature  must  be  finite  and  a  grammar  G  in  that  formal¬ 
ism,  there  exists  an  equivalent  context-free  grammar  G”. 

8.  The  following  sequence  of  figures  corresponds  to  a  fractal  called  a  Koch  island: 


These  figures  were  drawn  by  interpreting  strings  as  turtle  programs,  just  as  we 
did  in  Example  24.5  and  Example  24.6.  The  strings  were  generated  by  an  L-sys- 
lem  (7,  defined  with: 


2  =  {F.+,  -\. 

u>  =  F-F-F-F. 

To  interpret  the  strings  as  turtle  programs,  attach  meanings  to  the  symbols  in  2 
as  follows  (assuming  that  some  value  for  k  has  been  chosen): 

•  F  means  move  forward,  drawing  a  line  of  length  A. 

•  +  means  turn  left  90". 

•  -  means  turn  right  90". 

Figure  (a)  was  drawn  by  the  first  generation  string  w.  Figure  (b)  was  drawn  by 
the  second  generation  string,  and  so  forth.  R,,  contains  a  single  rule.  What  is  it? 


CHAPTER  25 

Computable  Functions  * 


In  almost  all  of  our  discussion  so  far,  we  have  focused  on  exactly  one  kind  of  problem: 
deciding  a  language.  We  saw  in  Chapter  2  that  other  kinds  of  problems  can  be  re¬ 
cast  as  language-decision  problems  and  so  can  be  analyzed  within  the  framework 
that  we  have  described.  But,  having  introduced  the  Turing  machine,  we  now  also  have 
a  way  to  analyze  programs  that  compute  functions  whose  range  is  something  other 
than  {Accept.  Reject). 


25.1  What  is  a  Computable  Function? 

Informally,  a  function  is  computable  if  there  exists  a  Turing  machine  that  can  compute 

it.  In  this  section  we  will  formalize  that  notion. 

25.1.1  Total  and  Partial  Functions 

We  begin  by  considering  two  classes  of  functions.  Let  /be  an  arbitrary  function. Then: 

•  /is  a  total  function  on  the  domain  Dom  iff /is  defined  on  all  elements  of  Dom. This 
is  the  standard  mathematical  definition  of  a  function  on  a  domain. 

•  /  is  a  partial  function  on  the  domain  Dom  iff  /is  defined  on  zero  or  more  elements 
or  Dom.  This  definition  allows  for  the  existence  of  elements  of  the  domain  on  which 
the  function  is  not  defined. 


EXAMPLE  25.1  Total  and  Partial  Functions 

•  Consider  the  successor  function  succ  and  the  domain  N  (the  natural  numbers). 
Slice  is  a  total  function  on  N.  It  is  also  a  partial  function  on  N. 

•  Consider  the  simple  string  function  midchar,  which  returns  the  middle  character 
of  its  argument  string  if  there  is  one. The  midchar  function  is  a  partial  function 
on  the  domain  of  strings.  But  it  is  not  a  total  function  on  the  domain  of 


ccc 
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EXAMPLE  25.1  ( Continued) 

strings,  since  it  is  undefined  for  strings  of  even  length.  It  is.  however,  a  total 
function  on  the  smaller  domain  of  odd  length  strings. 

•  Consider  the  function  steps ,  defined  on  inputs  of  the  form  <  M.  w>.  It  returns 
the  number  of  steps  that  Turing  machine  A7  executes,  on  input  *<*,  before  it 
halts. The  steps  function  is  a  partial  function  on  the  domain  { <M .  w> }.  But  it 
is  not  a  total  function  on  that  domain,  since  it  is  undefined  for  values  of 
<M.  w>  where  M  does  not  halt  on  w.  It  is.  however,  a  total  function  on  the 
smaller  domain  {<M.  «’>: Turing  machine  M  halls  on  input  tv\, 


Why  do  we  want  to  expand  the  notion  of  a  function  to  allow  for  partial  functions?  A 
cleaner  approach  is  simply  to  narrow  the  domain  so  that  it  includes  only  values  on 
which  the  function  is  defined.  So.  for  example,  in  the  case  of  the  midchar  function,  we 
simply  assert  that  its  domain  is  the  set  of  odd  length  strings.  Then  we  have  a  total  func¬ 
tion  and  thus  a  function  in  the  standard  mathematical  sense.  Of  course  we  can  do  the 
same  thing  with  the  function  steps:  We  can  refine  its  domain  to  include  only  values  on 
which  it  is  defined.  But  now  we  face  an  important  problem  given  that  our  task  is  to 
write  programs  (more  specifically,  to  design  luring  machines)  that  can  compute  func¬ 
tions.  The  set  of  values  on  which  steps  is  defined  is  the  language  H.  And  H  is  not  in  D 
(i.e..  it  is  not  a  decidable  set).  So,  no  matter  what  Turing  machine  we  might  build  to 
compute  steps,  there  exists  no  other  Turing  machine  that  can  examine  a  value  and  decide 
whether  the  steps  machine  should  be  able  to  run. 

Another  way  to  think  of  this  problem  is  that  it  is  impossible  for  any  implementation 
of  steps  to  check  its  precondition.  The  only  way  it  is  going  to  be  possible  to  build  an 
implementation  of  steps  is  going  to  be  to  define  its  domain  as  some  decidable  set  and 
then  allow  that  there  are  elements  of  that  domain  for  which  steps  will  not  return  a 
value,  Thus  steps  will  be  a  partial  and  not  a  total  function  of  the  domain  on  which  the 
program  that  implements  it  runs.  So  any  such  program  will  fail  to  hall  on  some  inputs. 


25.1.2  Partially  Computable  and  Computable  Functions 

Recall  that,  in  Section  17.2.2.  we  introduced  the  notion  of  a  Turing  machine  that  com¬ 
putes  an  arbitrary  function.  In  the  rest  of  this  section  wo  w  ill  expand  on  the  ideas  that 
we  sketched  there.  In  particular,  we  will  now  consider  functions,  like  midchar  and  steps, 
that  are  not  defined  on  all  elements  of 

We  begin  by  restating  the  basic  definitions  that  wc  gave  in  Section  17.2.2: 

•  Let  M  be  a  Turing  machine  with  start  stale  s.  halting  state  It.  input  alphabet  S,  and 
tape  alphabet  I\  The  initial  configuration  of  M  will  be  f.v.  J»r).  where  u<e  S>. 

•  Define  A#(ir)  =  z  iff  (v.  J «•)  |-w*  0*.  J*)-  In  other  words  M(u  )  =  z  iff  when 
started  on  a  string  w  in  £*,  halts  with  z  on  its  tape  and  its  read/write  head  is  just  to 
the  left  of  z. 
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•  We  say  that  a  Turing  machine  M  computes  a  function  / iff.  for  all  w  e  2*: 

•  If  iv  is  an  input  on  which /is  defined.  M(w)  =  f(w).  In  other  words,  M  halts  with 
/(if)  on  its  tape. 

•  Otherwise  M(u')  does  not  halt. 

•  A  function  /is  recursive  or  computable  iff  there  is  a  Tbring  machine  M  that  computes 

it  and  that  always  halls. 

But  what  about  functions  that  are  not  defined  on  all  elements  of  2*?  They  are  not 
computable  under  this  definition.  Let  /be  any  function  defined  on  some  subset  of  2*. 
Then /is  partially  computable  iff  there  exists  a  Turing  machine  M  that  computes  it.  In 
other  words.  M  halls  and  returns  the  correct  value  for  all  inputs  on  which  /  is  defined. 
On  all  other  inputs,  M  fails  to  hall. 

Let  /be  any  partially  computable  function  whose  domain  is  only  a  proper  subset  of 
v*.  Then  any  Turing  machine  that  computes/ will  fail  to  halt  on  some  inputs.  But  now 
consider  only  those  functions  / such  that  the  set  of  values  Dom  on  which /is  defined  is 
decidable.  In  other  words, /is  a  total  function  on  the  decidable  set  Dom.  For  example, 
millibar  is  such  a  function,  defined  on  the  decidable  set  of  odd  length  strings.  For  any 
such  function  /,  we  define  a  new  function  /'  that  is  identical  to / except  that  its  range  in¬ 
cludes  one  new  value,  which  we  will  call  Error.  On  any  input  z  on  which/is  undefined, 
f\z)  =  Error.  Given  a  Turing  machine  M  that  computes  /,  we  can  construct  a  new 
Turing  machine  M'  that  computes  /'  and  that  always  halts.  Let  Dom  be  the  set  of  values 
on  which /is  defined.  Since  Dom  is  in  D,  there  is  some  Turing  machine  TFthat  decides 
it.  Then  the  following  Turing  machine  M’  computes/': 

M\x)  « 

1.  Run  TF  on*. 

2.  If  it  rejects,  output  Error. 

3.  If  it  accepts,  run  M  on  x. 

We  have  simply  put  a  wrapper  around  M.  The  job  of  the  wrapper  is  to  check  M's 
precondition  and  only  run  M  when  its  precondition  is  satisfied. This  is  the  technique  we 
use  all  the  time  with  real  programs. 

Using  the  wrapper  idea  we  can  now  offer  a  broader  and  more  useful  definition  of 
computability:  Let  /  be  a  function  whose  domain  is  some  subset  of  2*.  Then  /  is 
computable  iff  there  exists  a  Turing  machine  M  that  computes  /'  (as  described  above) 
and  that  halls  on  all  inputs.  Equivalently,^  is  computable  iff  it  is  partially  computable 
and  its  domain  is  a  decidable  set. 

Now  suppose  that  /  is  a  function  whose  domain  and/or  range  is  not  a  set  of  strings. 
For  example,  both  the  domain  and  the  range  of  the  successor  function  succ  are  the  in¬ 
tegers.  Then/ is  computable  iff  all  of  the  following  conditions  hold: 

•  There  exist  alphabets  2  and  2*. 

•  There  exists  an  encoding  of  the  elements  of  the  domain  of /as  strings  in  2*. 

•  There  exists  an  encoding  of  the  elements  of  the  range  of/ as  strings  in  2'*. 
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•  There  exists  some  computable  function  /’  with  the  property  that,  for  every  u*  e  2*: 

•  If  iv  =  <x>  and  x  is  an  element  of  f  s  domain,  then  /"( ir)  =  <f[x)>.  and 

•  If  m»  is  not  the  encoding  of  any  element  of  fs  domain  (either  because  it  is  not  syn¬ 
tactically  well  formed  or  hecause  it  encodes  some  value  on  which  f  is  undefined), 
then  /’(«’)  —  Error. 


EXAMPLE  25.2  The  Successor  Function  succ 
Consider  again  the  successor  function: 

slice:  N  — *  N, 
siicc(x)  —  x  +  1. 

We  can  encode  both  the  domain  and  the  range  of  succ  in  unary  (i.e..  as  strings 
drawn  from  {l}*).Then  we  can  define  the  following  Turing  machine  M  to  com¬ 
pute  it: 

M(x)  - 

1.  Write  L 

2.  Move  left  once. 

3.  Halt. 

The  function  succ  is  a  total  function  on  M  Every  element  of  2*  =  { 1  }*  is  the 
encoding  of  some  element  of  N.  For  each  such  element  x.  Xt  computes  J[x)  and 
halls.  So  succ  is  computable. 


EXAMPLE  25.3  The  Function  midchar 

Consider  again  the  function  midciuir  that  we  introduced  in  Example  25.1.  Recall 
that  midciuir  is  a  total  function  on  the  set  of  odd  length  strings  and  a  partial  func¬ 
tion  on  the  set  of  strings.  Now  we  want  to  build  a  Turing  machine  M  to  compute 
midchur. 

The  most  straightforward  way  to  encode  a  string  .v  as  input  to  M  is  as  itself.  If 
we  do  that,  then  we  can  build  a  straightforward  l  uring  machine  M  that  behaves  as 
follows  on  input  .r: 

•  If  the  length  of  .v  is  odd,  compute  midchurix ). 

•  If  the  length  of  x  is  even,  then  what?  By  the  definition  of  a  machine  that  com¬ 
putes  a  function  f.  M  should  loop  on  all  values  for  which  f  is  not  defined.  So  it 
must  loop  on  all  even  length  inputs. 
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The  existence  of  M  proves  that  midchar  is  partially  computable.  But  midchar  is 
also  computable  because  the  following  Turing  machine  A/',  which  halts  on  all  in¬ 
puts,  computes  midchar': 

M'(x)  = 

1.  If  the  length  of  x  is  even,  output  Error. 

2.  Otherwise,  find  the  middle  character  of  x  and  output  it. 


EXAMPLE  25.4  The  Function  steps 

Consider  again  the  function  steps  that  we  introduced  in  Example  25.1.  Recall  that 
steps  is  a  total  function  on  the  set  {<M,  w> :  Turing  machine  M  halts  on  input 
w>}.  It  is  a  partial  function  on  the  set  {  <M ,  m>>}.  And  it  is  also  a  partial  function 
on  the  larger  set  of  strings  that  includes  syntactically  ill-formed  inputs.  Steps  is  a 
partially  computable  function  because  the  following  three-tape  Turing  machine  S 
computes  it: 

S(*)  = 

1.  If  jc  is  not  a  syntactically  well  formed  <M,  w>  string  then  loop. 

2.  If  x  is  a  syntactically  well  formed  <M,  w>  string  then: 

2.1.  Copy  M  to  tape  3. 

2.2.  Copy  w  to  tape  2. 

2.3.  Write  0  on  tape  1. 

2.4.  Simulate  M  on  on  tape  2,  keeping  a  count  on  tape  1  of  each  step  that 
M  makes. 

S  halts  whenever  its  input  is  well-formed  and  M  halts  on  w.  If  it  halts,  it  has  the 
value  of  steps(<M ,  w> )  on  tape  1.  By  Theorem  17.1,  there  exists  a  one-tape  Tur¬ 
ing  machine  S'  whose  output  is  identical  to  the  value  that  S  placed  on  tape  1.  So 
S'  is  a  standard  Turing  machine  that  computes  steps.  The  existence  of  S'  proves 
that  steps  is  partially  computable. 

But  steps  is  not  computable.  We  show  that  it  is  not  by  showing  that  there  exists 
no  Turing  machine  that  computes  the  function  steps',  defined  as: 

steps' (x)  =  If  Jr  is  not  a  syntactically  well-formed  <M ,  w>  string,  then  Error. 

If  .v  is  well-formed  but  steps(<M ,  w,.>)  is  undefined  (i.e.,  M  does 
not  halt  on  u  ).  then  Error. 

If  sieps{<M,  w>>)  is  defined  (i.e.,  M  halts  on  w).  then  steps 
(<M.w>).  ' 

We  prove  that  no  such  Turing  machine  exists  by  reduction  from  H.  Suppose 
that  there  did  exist  such  a  machine.  Call  it  ST.  Then  the  following  Turing  machine 
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EXAMPLE  25.4  ( Continued ) 

DH  would  decide  the  language  H  =  {<M,  w> :  Turing  machine  M  halls  on 
input  string  u>}: 

DH{<M,w>)  = 

1.  Run  ST(<M,iv>). 

2.  If  the  result  is  Error  then  reject.  Else  accept. 

But  we  know  that  there  can  exist  no  Turing  machine  to  decide  H.  So  ST  must 
not  exist.  So  steps  is  not  computable. 


25.1.3  Functions  That  Are  Not  Partially  Computable 

There  exist  functions  like  succ  and  miilchar  that  are  computable.  There  exist  functions 
like  steps  that  are  partially  computable  but  not  computable.  But  there  also  exist  func¬ 
tions  that  are  not  even  partially  computable. 


THEOREM  25.1  There  Exist  Functions  That  are  Not 
Even  Partially  Computable 

Theorem:  There  exist  (a  very  large  number  of)  functions  that  are  not  partially  com¬ 
putable. 

Proof:  We  will  use  a  counting  argument  similar  to  the  one  we  used  to  prove  a  similar 
result. Theorem  20.3,  which  says  that  there  exist  languages  that  are  not  semidecid- 
able.  We  will  consider  only  unary  functions  from  some  subset  of  N  (the  nonnega¬ 
tive  integers)  to  M.  Call  the  set  of  all  such  functions  U.  We  will  encode  both  the 
input  to  functions  in  U  and  their  outputs  as  binary  strings. 

Lemma:  There  is  a  countably  infinite  number  of  partially  computable  functions  in  U. 

Proof  of  Lemma:  Every  partially  computable  function  in  U  is  computed  by  some 
Turing  machine  M  with  2  and  T  equal  to  (0. 1}.  By  Theorem  17.7,  there  exists  an 
infinite  lexicographic  enumeration  of  all  such  syntactically  legal  Turing  machines. 
So.  by  Theorem  A.  1.  there  is  a  countably  infinite  number  of  luring  machines  that 
compute  functions  in  U .  There  cannot  be  more  partially  computable  functions 
than  there  are  Turing  machines,  so  there  is  at  most  a  countably  infinite  number  of 
partially  computable  functions  in  U. There  is  not  a  one-to-one  correspondence  be¬ 
tween  partially  computable  functions  and  the  Turing  machines  that  compute  them 
since  there  is  an  infinite  number  of  Turing  machines  that  compute  any  given  func¬ 
tion.  But  the  number  of  partially  computable  functions  in  U  must  be  infinite  be¬ 
cause  it  includes  all  the  constant  functions  (which  are  also  computable): 

c/,(jc)  =  l,c/:U)  =  2.  cfi(x)  =  3.... 

So  there  is  a  countably  infinite  number  of  partially  computable  functions  in  U. 
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Lemma:  There  is  an  uncountably  infinite  number  of  functions  in  U. 

Proof  of  Lemma:  For  any  element  s  in  3*(N)  (the  power  set  of  f^),  let  /,  be  the  char¬ 
acteristic  function  of  .v.  So  /V(.v)  =  1  if  x  e  s  and  0  otherwise.  No  two  elements  of 
^(N)  have  the  same  characteristic  function.  By  Theorem  AA  there  is  an  uncount¬ 
ably  infinite  number  of  elements  in  'IP(N).  so  there  is  an  uncountably  infinite  num¬ 
ber  of  such  characteristic  functions,  each  of  which  is  in  U. 

Proof  of  Theorem:  Since  there  is  only  a  countably  infinite  number  of  partially  com¬ 
putable  functions  in  U  and  an  uncountably  infinite  number  of  functions  in  U,  there 
is  an  uncountahly  infinite  number  of  functions  in  U  that  are  not  partially  com¬ 
putable. 

Now  wc  know  that  there  exist  many  functions  that  are  not  partially  computable.  But 
can  we  describe  one?  The  answer  is  yes.  One  way  to  do  so  is  by  diagonalization.  Let  E  be 
a  lexicographic  enumeration  of  the  Turing  machines  that  compute  the  partially  com¬ 
putable  functions  in  U.  Let  M,  be  the  <*h  machine  in  that  enumeration.  Define  a  new 
function  notcomp(x)  as  follows: 

not  comp:  N  — *  {0, 1}, 

notcomp(x)  =  1  if  Mx[x)  =  0. 0  otherwise. 

So  inui  omp(x)  =  0  if  either  M\x)  is  defined  and  the  value  is  something  other  than 
0  or  if  MX- v)  is  not  defined. This  new  function  notcomp  is  in  L/.  but  it  differs,  in  at  least 
one  place,  from  every  function  that  is  computed  by  a  Turing  machine  whose  encoding 
is  listed  in  E.  So  there  is  no  Turing  machine  that  computes  it.  Thus  it  is  not  partially 
computable. 


.1.4  The  Busy  Beaver  Functions 

There  exist  even  more  straightforward  total  functions  that  are  not  partially  computable. 
One  well  known  example  is  a  family  of  functions  called  busy  beaver  functions  B.To  de¬ 
fine  two  of  these  functions,  consider  the  set  T  of  all  standard  Turing  machines  M  (i.e..  de¬ 
terministic.  one-tape  machines  of  the  sort  defined  in  Section  17.1),  where  M  has  tape 
alphabet  1'  =  { J,  1)  and  M  halls  on  a  blank  tape. Then: 

♦  S(n)  is  defined  by  considering  all  machines  that  are  in  T  and  that  have  n  nonhalting 
stales.  The  value  of  S(n)  is  the  maximum  number  of  steps  that  are  executed  by  any 
such  H-slate  machine,  when  started  on  a  blank  tape,  before  it  halts. 

•  i(n)  is  defined  by  again  considering  all  machines  that  are  in  T  and  that  have  n  non- 
halting  slates. The  value  of  £(«)  is  the  maximum  number  of  l's  that  are  left  on  the 
tape  by  any  such  n-state  machine,  when  started  on  a  blank  tape,  when  it  halts. 

A  variety  of  other  busy  beaver  functions  have  also  been  defined.  Some  of  them 
allow  three  or  more  tape  symbols  (instead  of  the  two  we  allow).  Some  use  variants  of 
our  luring  machine  definition.  For  example,  our  versions  are  called  quintuple  versions, 
since  out  luring  machines  both  write  and  move  the  read/write  head  at  each  step  (so 


562  Chapter  25  Computable  Functions 


Table  25.1  Some  values  for  the  busy  beaver  functions. 

n 

S(n) 

Kn) 

1 

1 

2 

6 

4 

3 

21 

6 

4 

107 

13 

5 

>47.176.870 

4098 

6 

>3-1017*' 

aslJWtf6 

each  element  of  the  transition  function  is  a  quintuple).  One  common  variant  allows 
machines  to  write  or  to  move,  but  not  both,  at  each  step  (so  each  element  of  the  transi¬ 
tion  function  is  a  quadruple).  Quadruple  machines  typically  require  more  steps  than 
quintuple  machines  require  to  perform  the  same  task. 

All  of  the  busy  beaver  functions  provide  a  measure  of  how  much  work  a  TUring  ma¬ 
chine  with  n  states  can  do  before  it  halts.  And  none  of  them  is  computable.  In  a  nutshell, 
the  reason  is  that  their  values  grow  too  fast,  as  can  be  seen  from  Table  25.1,  which  sum¬ 
marizes  some  of  what  is  known  about  the  values  of  S  and  2.  as  we  defined  them  above. 
For  values  of  n  greater  than  4  (in  the  case  of  S)  or  5  in  the  case  of  the  actual  values 
are  not  known  but  lower  bounds  on  them  are,  as  shown  in  the  table.  For  the  latest  results 
in  determining  these  bounds,  see  R. 

THEOREM  25.2  5  and  2  are  Total  Functions 

Theorem:  Both  5  and  2  are  total  functions  on  the  positive  integers. 

Proof:  For  any  value  n,  both  S(n)  and  2(n)  are  defined  iff  there  exists  some 
standard  Turing  machine  M,  with  tape  alphabet  F  =  { J.  1}.  where: 

•  M  has  n  nonhalting  stales,  and 
■  M  halls  on  a  blank  tape. 

We  show  by  construction  that  such  a  Turing  machine  M  exists  for  every  integer 

value  of  n  s  1.  We  will  name  the  nonhalting  states  of  M  w  ith  the  integers  1 . n. 

We  can  build  M  as  follows: 

1.  Let  state  1  be  the  start  state  of  M. 

2.  For  all  /  such  that  1  <  i  <  n.  add  to  the  transition  ((/  -  1.  J).  (/,  □,  — * )). 

3.  Let  M  have  a  single  halting  state  called  It. 

4.  Add  to  8\i  the  transition  ((/i.  □).  (/i.  J.  — ► )). 

M  is  a  standard  Turing  machine  with  tape  alphabet  V  =  ( J.  1}.  it  has  n  non¬ 
halting  states,  and  it  halts  on  a  blank  tape.  It  is  shown  in  Figure  25.1. 
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FIGURE  25.1  A  hailing  Turing  machine  with  n  nonhalting  stales. 


So  both  5  and  2  arc  defined  on  all  positive  integers.  If  they  are  not  computable.it  is  not 
because  their  domains  are  not  in  D.  But  they  are  not  computable.  We  first  prove  a  lemma 
and  then  use  it  to  show  that  both  busy  beaver  functions,  5  and  2,  are  not  computable. 

THEOREM  253  The  Busy  Beaver  Functions  are  Strictly 
Monotonically  Increasing 

Theorem:  Both  5  and  2  are  strictly  monotonically  increasing  functions.  In  other 
words: 

5(h)  <  5(hi)  iff  h  <  m,  and 

2(h)  <  2(hi)  iff  n  <  m. 

Proof:  We  must  prove  four  claims: 

•  n  <  hi  — *  5(h)  <  5(/h):  Let  5(/t)  =  /(.Then  there  exists  an  H-state  Turing  ma¬ 
chine  TN  that  runs  for  k  steps  and  then  halts.  From  TN  we  can  build  an  ni-state 
Turing  machine  TM  that  runs  for  k  4-  (hi  -  n)  steps  and  then  halts.  We  add 
hi  -  n  states  to  TN.  Let  any  state  that  was  a  halting  state  of  TN  cease  to  be  a 
halting  state.  Instead,  make  it  go.  on  any  input  character,  to  the  first  new  state, 
write  a  1,  and  move  right.  From  that  first  new  state,  go.  on  any  input  character, 
to  the  second  new  stale,  write  a  l,and  move  right.  Continue  through  all  the  new 
slates.  Make  the  last  one  a  halting  state. This  new  machine  executes  k  steps,  just 
as  TN  did.  and  then  an  additional  m  -  n  steps.  Then  it  halts.  So 
5(/n)  s  5(h)  4  (hi  —  n).  Since  hi  >  n,  (hi  —  h)  is  positive.  So  5(hi)  >  5(h). 

•  5(h)  <  5(hi)  -*  h  <  hi:  We  can  rewrite  this  as  -,(n  <  hi)  -*  -,(S(h)  <  5(hi)) 
and  then  as  n  ^  hi -»  5(h)  >  S(m).  If  n  =  hi,  then  5(hi)  =  5(h).  If  n  >  hi, 
then  by  the  first  claim,  proved  above,  5(h)  >  5(hi). 

•  h  <  hi-*  2(h)  <  2(hi):  Analogously  to  the  proof  that  n  <  hi-*  5(h)  <  5(m) 
but  substitute  2  for  5. 

•  2(h)  <  2(hi)  — *  h  <  hi:  Analogously  to  the  proof  that  5(h)  <  5(hi)^* 
h  <  hi.  but  substitute  2  for  5. 


THEOREM  25.4  The  Busy  Beaver^unctions  are  Not  Computable 

Theorem:  Neither  5  nor  2  is  computable. 

Proof:  We  will  prove  that  5  is  not  computable.  We  leave  the  proof  of  2  as  an 
exercise. 
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Table  25.2  The  number  of  states  in  Trouhle„ 

Component 

Number  of  States 

Write„ 

n  +  1 

:R 

1 

Write,, 

n  + 1 

L_, 

2 

Multiply 

m 

Lj 

2 

BB 

b 

Total 

2  n  +  m  +  b  +  7 

Suppose  that  S  were  computable.  Then  there  would  be  some  Turing  machine 
BB ,  with  some  number  of  states  that  we  can  call  b .  that  computes  it.  For  any  pos¬ 
itive  integer  n ,  we  can  define  a  Turing  machine  Write,,  that  writes  n  l’s  on  its 
tape,  one  at  a  time,  moving  rightwards,  and  then  halls  with  its  read/write  head  on 
the  blank  square  immediately  to  the  right  of  the  rightmost  1.  Write,,  has  n  non¬ 
halting  states  plus  one  halting  state.  We  can  also  define  a  Turing  machine 
Multiply  that  multiplies  two  unary  numbers,  written  on  its  tape  and  separated  by 
the  ;  symbol.  The  design  of  Multiply  was  an  exercise  in  Chapter  17.  Let  m  be  the 
number  of  states  in  Multiply. 

Using  the  macro  notation  we  described  in  Section  17.1 .5.  we  can  define,  for  any 
positive  integer  n,  the  following  Turing  machine,  which  we  can  call  Trouble 

>Write„  ;  R  Writen  Lq  Multiply  Lq  BB 

Troublen  first  writes  a  string  of  the  form  1";1”.  It  then  moves  its  read/write  head 
back  to  the  left  so  that  it  is  on  the  blank  square  immediately  to  the  left  of  that 
string.  It  invokes  Multiply,  which  results  in  the  tape  containing  a  siring  of  exactly 
n2  l’s.  It  moves  its  read/write  head  back  to  the  left  and  then  invokes  BB,  which 
outputs  S(n2).  The  number  of  states  in  Trouble,,  is  shown  in  Table  25.2. 

Since  BB,  the  final  step  of  Trouble,,,  writes  a  string  of  length  S(/r)  and  it  can 
write  only  one  character  per  step,  T rouble,,  must  run  for  at  least  S(/r)  steps.  Since, 
for  any  n  >  0,  Trouble,,  is  a  Thring  machine  with  2n  +  m  +  b  +  7  states  that 
runs  for  at  least  S(n2)  steps,  we  know  that: 

S(2n  +  m  +  b  +  l)*  S(n2). 

By  Theorem  25.3,  we  know  that  S  is  monotonically  increasing,  so  it  must  also  be 
true  that,  for  any  n  >  0: 

2n  +  m  +  b  +  7  s  /?2. 

But,  since  n2  grows  faster  than  n  does,  that  cannot  be  true.  In  assuming  that  BB  ex¬ 
ists,  we  have  derived  a  contradiction.  So  BB  does  not  exist.  So  5  is  not  computable. 
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25.1.5  Languages  and  Functions 

It  should  be  clear  by  now  that  there  is  a  natural  correspondence  between  languages,  which 
may  be  in  D,  SD/D,  or  -i  SD,  and  functions,  which  may  be  computable,  partially  com¬ 
putable,  or  neither.  We  can  construct  Table  25.3.  It  gives  us  now  three  ways  to  present  a 
computational  problem. 


25.2  Recursive  Function  Theory 

We  have  been  using  the  terms: 

•  decidable ,  to  describe  languages  that  can  be  decided  by  some  Hiring  machine, 

•  semidecidable ,  to  describe  languages  that  can  be  semidecided  by  some  Hiring  machine, 

•  partially  computable ,  to  describe  functions  that  can  be  computed  by  some  Hiring 
machine,  and 

•  computable ,  to  describe  functions  that  can  be  computed  by  some  Hiring  machine 
that  halts  on  all  inputs. 


fjbble  25.3  The  problem,  language,  and  functional  views. 

~^he  Problem  View 

The  Language  View 

The  Functional  View 

Given  three  natural 

{<*>  *  <y>  =  <z> : 

f:Nx  N-*N, 

numbers,  x,y,  and  z,is 

jf,  y,  ze  {0,1}*  and 

f(x,y)  =  x-y. 

z  *  x-yl 

num(. x)  •  num(y) 

=  num[z)}. 

D 

Computable 

Given  a  Turing 

{<M> : TM  M  has  an  even 

f:  {<M>}  —*  Boolean, 

machine  M,  does 

number  of  states}. 

/(<Af>)  =  True  ifTM  M  has  an  even 

yV/  have  an  even 

number  of  states, 

number  of  states? 

False  otherwise. 

D 

Computable 

Given  a  Turing 

{<Af,  tu,  «>  :  TM  TM  M 

/:{<Af,u»}-*N, 

machine  M  and  a 

halts  on  w  in  n  steps}. 

/(<Af ,  w>)  =  ifTM  M  halts  on  w 

string  iv,  does  M 

then  the  number  of  steps 

halt  on  w  in  n  steps? 

SD/D 

it  executes  before  halting, 
else  undefined. 

Partially  computable 

Given  a  Hiring  machine 

{ <M ,  n>  :  TM  M  halts  on 

/:{<M>}-N, 

does  M  halt  on  all 

each  element  of  2*  in  no 

f(<M>)  =  ifTM  M  halts  on  all  strings 

strings  in  no  more  than 

more  than  n  steps}. 

then  the  maximum  number 

n  steps? 

of  steps  it  executes  before 
halting,  else  undefined. 

-SD 

Not  partially  computable 
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The  more  traditional  terminology  is: 

•  recursive  for  decidable. 

•  recursively  enumerable  for  sent idecidahle. The  recursively  enumerable  languages  are 
often  called  just  the  RE  or  r.e.  languages, 

•  partial  recursive  for  partially  computable. 

•  recursive  for  computable. 

Before  we  continue,  wc  need  to  issue  one  warning  about  the  fact  that  there  is  no  stan¬ 
dard  definition  for  some  of  these  terms.  The  terms  computable  and  recursive  are  used  in 
some  discussions,  including  this  one.  to  refer  just  to  functions  that  can  be  computed  by  a 
Turing  machine  that  always  halts.  In  some  other  discussions,  they  arc  used  to  refer  to  the 
class  we  have  called  the  partial  recursive  or  the  partially  computable  functions. 

Why  arc  the  computable  functions  traditionally  called  recursive'!  The  word  makes 
sense  if  you  think  of  recursive  as  a  synonym  for  computable.  In  this  section,  we  will  see 
why  recursive  is  a  reasonable  synonym  for  computable.  In  the  rest  of  this  section,  to  be 
compatible  with  conventional  treatments  of  this  subject,  we  will  use  the  term  recursive 
function  to  mean  computable  function. 

A  recursive  function  is  one  that  can  be  computed  by  a  Turing  machine  that  halts  on 
all  inputs.  A  partial  recursive  function  is  one  that  can  be  computed  by  some  Turing  ma¬ 
chine  (but  one  that  may  loop  if  there  are  any  inputs  on  which  the  function  is  undefined). 
So  we  have  definitions,  stated  in  terms  of  a  computational  framework,  for  two  impor¬ 
tant  classes  of  functions.  Let’s  now  ask  a  different  question:  Are  there  definitions  of  the 
same  classes  of  functions  that  do  not  appeal  to  any  model  of  computation  but  that  can 
instead  be  derived  from  standard  mathematical  tools,  including  the  definition  of  a  small 
set  of  primitive  functions  and  the  ability  to  construct  new  functions  using  operators  such 
as  composition  and  recursion?  The  answer  is  ves. 

In  the  rest  of  this  section  we  will  develop  such  a  definition  for  a  class  of  functions 
that  turns  out  to  be  exactly,  given  an  appropriate  encoding,  the  recursive  functions. 
And  we  will  develop  a  similar  definition  for  the  class  of  recursively  enumerable  func¬ 
tions.  We  will  build  a  theory  of  functions. each  of  which  has  a  domain  that  is  an  ordered 
tf-luple  of  natural  numbers  and  a  range  that  is  the  natural  numbers.  We  have  already 
shown  that  numbers  can  be  represented  as  strings  and  strings  can  be  represented  as 
numbers,  so  there  is  no  fundamental  incompatibility  between  the  theory  we  are  about 
to  describe  and  the  one.  based  on  Turing  machines  operating  on  strings,  that  we  have 
already  considered. 


25.2.1  Primitive  Recursive  Functions 

We  begin  by  defining  the  primitive  recursive  functions  to  be  the  smallest  class  of  func¬ 
tions  from  N  x  PsJ  x  ■  X  to  bJ  that  includes: 

•  the  constant  function  0. 

•  the  successor  function:  sttcc(n)  =  n  +  Land 

•  a  family  of  projection  functions:  for  any  0  <  j  ^  k.  1,  n2„ .  .tin)  -  nft 
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and  that  is  closed  under  the  operations: 

•  composition  of  g  with  hh  h2,. .  ./»*; 

•  primitive  recursion  of /in  terms  of  g  and  /i: 

•  /(/»|, «2- _ 0)  =  g(ri|.n2,... nk). This  is  the  base  case. 

•  w  +  1)  =  /i(/»|, m,f(n\, n2,...n^ ni)).  Note  that  in 
this,  the  recursive  case,  the  function  h  takes  a  large  number  of  arguments.  It  need 
not,  however,  use  all  of  them,  since  the  projection  functions  make  it  possible  to  se¬ 
lect  only  those  arguments  that  are  needed. 

EXAMPLE  25.5  Primitive  Recursive  Functions  Perform  Arithmetic 

To  make  these  examples  easier  to  read,  we  will  define  the  constant  1  =  sncc(O). 
All  of  the  following  functions  are  primitive  recursive: 

•  The  function  plus,  which  adds  two  numbers: 

plus(n,Q)  =  pu(n)  =  n. 

plus(n,  m  +  1)  =  succ{p^(n,  m,  plus(n,  m))). 

For  clarity,  we  will  simplify  our  future  definitions  by  omitting  the  explicit  calls 
to  the  projection  functions.  Doing  that  here,  we  get: 

plus(n,  0)  =  n. 

plus(n,  m  +  1)  =  succ(plus(n,  m)). 

•  The  function  times : 

times(n,  0)  =  0. 

times(n,m  +  1)  =  plus(n,times(n,m)). 

•  The  function  factorial,  more  usually  written  n\: 

factorial^  0)  =  1. 

factorial (n  +  1)  =  times(succ(n),  factorial(n)). 

•  The  function  exp,  more  usually  written  nm: 

exp(n,  0)  =  1. 

expert,  m  4-  1)  =  times(n,  exp(n,  m)). 

•  The  predecessor  function  pred,  which  is  defined  as  follows: 

pred{  0)  =  0. 
pred{n  +  1)  =  n. 
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Many  other  straightforward  functions  are  also  primitive  recursive.  We  may  now  wish 
to  ask,  “What  is  the  relationship  between  the  primitive  recursive  functions  and  the  com¬ 
putable  functions?”  All  of  the  primitive  recursive  functions  that  we  have  considered  so 
far  are  computable.  Are  all  primitive  recursive  functions  computable?  Are  all  com¬ 
putable  functions  primitive  recursive?  We  will  answer  these  questions  one  at  a  lime. 

THEOREM  25.5  Every  Primitive  Recursive  Function  is  Computable _ 

Theorem:  Every  primitive  recursive  function  is  computable. 

Proof:  Each  of  the  basic  functions,  as  well  as  the  two  combining  operations  can  be 
implemented  in  a  straightforward  fashion  on  a  Turing  machine  or  using  a  stan¬ 
dard  programming  language.  We  omit  the  details. 

THEOREM  25.6  Not  Every  Computable  Function  is  Primitive  Recursive 

Theorem:  There  exist  computable  functions  that  are  not  primitive  recursive. 

Proof:  The  proof  is  by  diagonalization.  We  will  consider  only  unary  functions;  we 
will  show  that  there  exists  at  least  one  unary  computable  function  that  is  not 
primitive  recursive. 

We  first  observe  that  it  is  possible  to  create  a  lexicographic  enumeration  of 
the  definitions  of  the  unary  primitive  recursive  functions.  To  do  so,  we  first  de¬ 
fine  an  alphabet  S  that  contains  the  symbols  0.  1,  the  letters  of  the  alphabet 
(for  use  as  function  names),  and  the  special  characters  (. ),  =  and  comma  (,). 
Using  the  definition  of  the  primitive  recursive  functions  given  above,  we  can 
build  a  Turing  machine  Af  that  decides  the  language  of  syntactically  legal  unary 
primitive  recursive  functions.  So,  to  produce  the  desired  lexicographic  enumer¬ 
ation  of  the  primitive  recursive  function  definitions,  it  suffices  to  enumerate 
lexicographically  all  strings  over  2*  and  output  only  those  that  are  accepted  by 
M.  We  will  choose  to  number  the  elements  of  this  enumeration  (the  function 
definitions)  starting  with  0. 

Using  the  lexicographic  enumeration  of  the  primitive  recursive  function  defi¬ 
nitions  that  we  just  described  and  a  straightforward  lexicographic  enumeration  of 
the  natural  numbers  (the  possible  arguments  to  those  functions),  we  can  imagine 
Table  25.4  which  we  will  call  T.  T[i,j]  contains  the  value  of  / \  applied  to  j.  Since 
every  primitive  recursive  function  is  computable,  there  exists  a  Turing  machine 
that  can  compute  the  value  for  any  cell  in  /'when  it  is  required. 

We  now  define  the  function  diagonal(n)  =  succ(T(n ,  /i)),  which  can  be  com¬ 
puted  by  the  following  Turing  machine  M: 

M(n)  = 

1.  Run  the  Turing  machine  that  computes  /„  on  n.  Let  the  value  it  produces  be  x. 

2.  Return  x  +  1. 

The  function  diagonal  is  computable  (by  M)  but  it  is  not  in  the  enumeration  of 
primitive  recursive  functions  since  it  differs  from  each  or  those  in  at  least  one 
place.  So  there  exist  computable  functions  that  are  not  primitive  recursive. 
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Table  25.4  Using  diagonalization  to  prove  that  there  are  computable  functions  that  are  not 
primitive  recursive. 
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25.2.2  Ackermann's  Function 

Now  we  know  that  there  exists  at  least  one  computable  function  that  is  not  primitive 
recursive.  But  are  there  others?  The  answer  is  yes. 

Consider  Ackermann's  function  H,  defined  as  follows  on  the  domain  N  x 

y)  =  y  +  1. 

A(  x  +  1,0)  =  A(  x,  1). 

A(x  +  1,  y  +  1)  =  A(x ,  A( x  +  1,  y )). 

Table  25.5  shows  a  few  values  for  A.  Table  25.6  comments  on  some  of  the  values  in 
the  last  row  of  Table  25.5. 

So  imagine  that,  at  every  second  since  the  Big  Bang,  we  had  written  one  digit  on 
every  atom  in  the  universe.  By  now  we  would  have  written  approximately  3  *  1096  dig¬ 
its,  which  is  not  enough  to  have  written  (much  less  computed)  A{ 4,3). 

Ackermann's  function,  unlike  the  busy  beaver  functions  of  Section  25.1.4,  is  recur¬ 
sive  (computable).  Ignoring  memory  and  stack  overflow,  it  is  easy  to  write  a  program 
to  compute  it.  But  Ackermann’s  function  is  not  primitive  recursive.  While  it  does  not 
grow  as  fast  as  the  busy  beaver  functions,  it  does  grow  faster  than  many  other  fast¬ 
growing  functions  like  fermal.  It  is  possible  to  prove  that  A  is  not  primitive  recursive 
precisely  because  it  grows  so  quickly  and  there  is  an  upper  bound  on  the  rate  at  which 
primitive  recursive  functions  can  grow. 


ppdile  25.5  The  first  few  values  of  Ackermann’s  function. 
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Table  25.6  Ackermann’s  function  grows  very  fast. 

fltt 

Decimal  digits  required  to  express  this 
value 

To  put  that  number  in  perspective 

(4,2) 

19,729 

There  have  been  about  12  •  10“  years  or 

3*  1017  seconds  since  the  Big  Bang. 

(4,3) 

105‘M" 

There  are  about  107“  atoms  in  the  observable 
universe. 

(4,4) 

10"'ww 

So  A  is  another  example  of  a  computable  function  that  is  not  primitive  recursive. 


25.23  Recursive  (Computable)  Functions 

Since  there  are  computable  functions  that  are  not  primitive  recursive,  we  are  still  look¬ 
ing  for  a  way  to  define  exactly  the  functions  that  Turing  machines  can  compute. 

We  next  define  the  class  of  /t-recursive  functions  using  the  same  basic  functions  that 
we  used  to  define  the  primitive  recursive  functions.  We  will  again  allow  function  com¬ 
position  and  primitive  recursion.  But  we  will  add  one  way  of  defining  a  new  function. 

We  must  first  define  a  new  notion:  The  minimalization  f  of  a  function  g  (of  k  +  1 
arguments)  is  a  function  of  k  arguments  defined  as  follows: 

=  the  smallest  w  such  that 

=  1,  if  there  is  such  an  m, 

0,  otherwise. 

Clearly,  given  any  function  g  and  any  set  of  k  arguments  to  it.  there  either  is  at  least 
one  value  m  such  that  g(n(,  tty, . . .  nk,  m)  =  1  or  there  isn't.  If  there  is  at  least  one  such 
value,  then  there  is  a  smallest  one  (since  we  are  considering  only  the  natural  numbers). 
.So  there  always  exists  a  function  /  that  is  the  minimalization  of  g.  If  g  is  computable, 
then  we  can  build  a  Turing  machine  ^min  that  almost  computes /as  follows: 

1.  w  =  0. 

2.  While g(/i|, /»?)  ¥■  1  do: 

m  =  m  +  1. 

3.  Return  m. 

The  problem  is  that  7*mjn  will  not  halt  if  no  value  of  m  exists. There  is  no  way  for  Tmin 
to  discover  that  no  such  value  exists  and  thus  return  0. 

Since  we  are  trying  to  build  a  theory  of  computable  functions  (those  for  which  there 
exists  a  Turing  machine  that  always  halts),  we  next  define  the  class  of  minimalizable 
functions  as  follows:  A  function  g  is  minimalizable  iff.  for  every  n  j.  n2, . . .  nk.  there  is  an 
m  such  that  g(;i|.  w2,... /?*,/«)  =  1.  In  other  words,  g  is  minimalizable  if  Tmin,  as  defined 
above,  always  halts. 
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We  define  the  p-recursive  functions  to  be  the  smallest  class  of  functions  from 
NxNx...xNloN  that  includes: 

•  the  constant  function  0, 

•  the  successor  function:  succ(n)  =  n  +  1,  and 

•  the  family  of  projection  functions:  For  any  k  s  /  >  0,  pkj(t it.  iti, . . .  nk)  =  nj, 
and  that  is  closed  under  the  operations: 

•  composition  of  #  with  li j,  hi, . . .  Itk: 


•  primitive  recursion  of  /in  terms  of  g  and  It,  and 

•  minimnlizalion  of  minimalizable  functions. 

A  good  way  to  get  an  intuitive  understanding  of  the  difference  between  the  primi¬ 
tive  recursive  functions  and  the  /x-recursive  functions  is  the  following: 

•  In  the  computation  of  any  primitive  recursive  function,  iteration  is  always  bounded: 
it  can  be  implemented  with  a  for  loop  that  runs  for  nk  steps,  where  nk  is  the  value  of 
the  last  argument  to /.  So.  for  example,  computing  times{2, 3)  requires  invoking  plus 
three  times. 

•  In  the  computation  of  a  ^-recursive  function,  on  the  other  hand,  iteration  may  re¬ 
quire  the  execution  of  a  while  loop  like  the  one  in  Tinin.  So  it  is  not  always  possible  to 
impose  a  bound,  in  advance,  on  the  number  of  steps  required  by  the  computation. 

THEOREM  25.7  Equivalence  of  /x-Recursion  and  Computability 

Theorem:  A  function  is  /x-recursive  iff  it  is  computable. 

Proof:  We  must  show  both  directions: 

•  Every  /x-recursive  function  is  computable.  We  show  this  by  showing  how  to 
build  a  luring  machine  for  each  of  the  basic  functions  and  for  each  of  the 
combining  operations. 

•  Every  computable  function  is  /x-recursive.  We  show  this  by  showing  how  to 
construct  /x-recursive  functions  to  perform  each  of  the  operations  that  a  Tur¬ 
ing  machine  can  perform. 

We  will  omit  the  details  of  both  of  these  steps.  They  are  straightforward  but 
tedious. 

We  have  now  accomplished  our  first  goal.  We  have  a  functional  definition  for  the 
class  of  computable  functions. 

It  is  worth  pointing  out  here  why  the  same  diagonalization  argument  that  we  used  in 
the  case  of  primitive  recursive  functions  cannot  be  used  again  to  show  that  there  must 
exist  some  computable  function  that  is  not  /x-recursive.  The  key  to  the  argument  in  the 
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case  of  the  primitive  recursive  functions  was  that  it  was  possible  to  create  a  lexicographic 
enumeration  of  exactly  the  primitive  recursive  function  definitions. The  reason  it  was  pos¬ 
sible  is  that  a  simple  examination  of  the  syntax  of  a  proposed  function  tells  us  whether  or 
not  it  is  primitive  recursive.  But  now  consider  trying  to  do  the  same  thing  to  decide 
whether  a  function /is  /z-recursive.  If /is  defined  in  terms  of  the  miniinalizaiion  of  some 
other  function  g,  we  would  first  have  to  check  to  sec  whether  g  is  minimalizable.  To  do 
that,  we  would  need  to  know  whether  TmM  halls  on  all  inputs. That  problem  is  undecidable. 

Sir  there  exists  no  lexicographic  enumeration  of  the  /i-recursivc  functions. 

Next  we  will  attempt  to  find  a  functional  definition  for  the  class  of  partially  com¬ 
putable  functions.  We  define  the  panial  p-recursive  functions  to  be  the  smallest  class 
of  functions  from  N  x  MX...X  M  to  N  that  includes: 

•  the  constant  function  0, 

•  the  successor  function: sitcc(n)  =  n  +  I,  and 

•  the  family  of  projection  functions:  For  any  k  s  j  >  0.  pk ,(/»).  n2.  ...«/)  =  n,, 
and  that  is  closed  under  the  operations: 

•  composition  of/»  with  h^lt2,...hk., 

•  primitive  recursion  of /in  terms  of  g  and  It.  and 

•  minimali/.ation. 

The  only  difference  between  this  definition  and  the  one  that  we  gave  for  the 
/x-recursive  functions  is  that  we  now  allow  minimalizalion  of  any  function,  not  just  the 
minimalizable  ones.  A  function  that  is  defined  in  this  way  may.  therefore,  not  be  a  total 
function.  So  it  is  possible  that  there  exists  no  Turing  machine  that  computes  it  and  that 
always  halts. 

THEOREM  25.8  Equivalence  of  Partial  /x-Recursion  and  Partial 

Computability 

I  Theorem:  A  function  is  a  partial  /x-recursive  function  iff  it  is  partially  computable. 
Proof:  We  must  show  both  directions: 

■  Every  partial  /i-recursive  function  is  partially  computable.  We  show  this  by 
showing  how  to  build  a  Turing  machine  for  each  of  the  basic  functions  and  for 
each  of  the  combining  operations.  Note  that  the  Turing  machine  that  imple¬ 
ments  the  minimalizalion  of  a  function  that  is  not  minimalizable  will  not  be 
guaranteed  to  hall  on  all  inputs. 

•  Every  partially  computable  function  is  partial  /x-recursive.  We  show  this  by 
showing  how  to  construct  /x-rccursive  functions  to  perform  each  of  the  opera¬ 
tions  that  a  Turing  machine  can  perform. 

We  will  omit  the  details  of  both  of  these  steps. They  are  straightforward  but 
tedious. 
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25.3  The  Recursion  Theorem  and  Its  Use 

In  this  section,  we  prove  the  existence  of  a  very  useful  computable  (recursive)  func¬ 
tion:  obiainSelf.  When  called  as  a  subroutine  by  any  Turing  machine  M.  obtainSelf 
writes  onto  M's  tape  the  siring  encoding  of  M. 

We  begin  by  asking  whether  there  exists  a  Turing  machine  that  implements  the  fol¬ 
lowing  specification: 

virus  0  = 

1.  For  each  address  in  address  book  do: 

1.1.  Write  a  copy  of  myself. 

1.2.  Mail  it  to  the  address. 

2.  Do  something  fun  and  malicious  like  change  one  bit  in  every  file  on  the  machine. 

3.  Halt. 

In  particular,  can  we  implement  step  1.1  and  build  a  program  that  writes  a  copy  of 
itself?  That  seems  simple  until  we  try.  A  program  that  writes  any  literal  string 
a  =  is  simply: 

•  Write  “d|«2fl3 . . .  a„". 

But.  using  that  simple  string  encoding  of  a  program,  this  program  is  8  characters  longer 
than  the  string  it  writes.  So  if  we  imagined  that  our  original  code  had  length  k  then  the 
program  to  write  it  would  have  length  k  +  8.  But  if  that  code  contained  the  write  state¬ 
ment.  we  would  need  to  write: 

•  Write  “Write”  ||  . . .  u„m. 

But  now  we  need  to  write  that,  and  so  forth.  Perhaps  this  seems  hopeless.  But  it  is  not. 
First,  let’s  rearrange  virus  a  little  bit: 

virusQ  = 

1.  copy  me  =  copy  of  myself. 

2.  For  each  address  in  address  book  do: 

2.1.  Mail  copyme  to  the  address. 

3.  Do  something  fun  and  malicious  like  change  one  bit  in  every  file  on  the  ma¬ 
chine. 

4.  Halt. 

If  virus  can  somehow  get  a  single  copy  of  itself  onto  its  tape,  a  simple  loop  (of  fixed 
length,  independent  of  the  length  of  the  copy)  can  make  additional  copies,  which  can 
then  be  treated  like  any  other  string. The  problem  is  for  virus  to  get  access  to  that  first 
copy  of  itself.  Here’s  how  we  can  solve  that  problem. 

First,  we  will  define  a  family  of  printing  functions,  Pv.  For  any  literal  strings,  Pf  is  the 
description  of  a  luring  machine  that  writes  the  string  s  onto  the  tape.  Think  of  s  as 
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being  hardwired  into  /*,.  For  example,  Pabbb  =  <aRbRbRbR>.  Notice  that  the  length 
of  the  Turing  machine  P,  depends  on  the  length  of  .v. 

Next  we  define  a  Turing  machine,  createP.  that  takes  a  string  s  as  input  on  one  tape 
and  outputs  the  printing  function  P„  on  a  second  tape: 

createP(s)  = 

1.  For  each  character  c  in  s  (on  tape  1 )  do  on  tape  2: 

1.1.  Write  c. 

1.2.  Write  R. 

Notice  that  the  length  of  createP  is  fixed.  It  does  not  need  separate  code  for  each 
character  of  ,v.  It  has  just  one  simple  loop  that  reads  the  characters  of  s  one  at  a  time 
and  outputs  two  characters  for  each. 

Now  let’s  break  virus  down  into  two  parts: 

•  Step  1:  We’ll  call  this  step  copy.  It  writes  on  the  tape  a  string  that  is  the  description 

of  virus. 

•  Steps  2, 3,  and  4,  or  whatever  else  virus  wants  to  do:  We'll  call  this  part  worA.This 

part  begins  with  virus's  description  on  the  tape  and  does  whatever  it  wants  with  it. 

We  will  further  break  step  1  ,copy,  down  into  two  pieces  that  we  will  call  A  and  B.  A 
will  execute  first.  Its  job  will  be  to  write  <B.  work>,  the  description  of  B  and  work 
onto  the  tape. The  string  <B ,  work>  will  be  hardwired  into  /4,so  the  length  of  A  itself 
depends  on  |<fl,  worA>|.  When  A  is  done,  the  tape  will  be  as  shown  in  Figure  25.2(a). 

The  job  of  B  will  be  to  write  <A>.  the  description  of  A.  onto  the  tape  immediately 
to  the  left  of  what  A  wrote.  So,  after  B  has  finished,  the  job  of  copying  virus  will  be 
complete  and  the  tape  will  be  as  shown  in  Figure  25.2(b). 

Suppose  that  we  knew  exactly  what  B  was. Then  A  would  be  P*.axw»rk>-  Assuming 
that  we  describe  A  in  our  macro  language,  |<A>|  would  then  be  2*  |<B><ivorA>|1 
since  for  each  character  it  must  write  and  then  move  one  square  to  the  right.  But  what 
is  B ?  It  must  be  a  machine  that  writes  <A>.  And  its  length  must  be  fixed.  It  cannot  de¬ 
pend  on  the  length  of  <A>.  since  then  the  length  of  <A>  would  depend  on  the 
length  of  <B> ,  which  would  depend  on  the  length  of  <A>  and  so  forth.  So  it  cannot 
just  be  P<A>. 

Fortunately,  we  know  how  to  build  B  so  that  it  writes  <A>  and  does  so  with  a  fixed 
chunk  of  code,  independent  of  the  length  of  A.  Given  any  siring  s  on  tape  1,  createP 


<B>  <work> 

(a) 


<A>  <B>  <work> 

(h> 


FIGURE  25.2  The  result  of  running  A  and  then  B. 
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writes,  onto  a  second  tape,  the  description  of  a  Turing  machine  that  writes  s.  And  it  does 
so  with  a  fixed  length  program.  A  is  a  program  that  writes  a  string.  So  perhaps  B  could 
use  create P  to  write  a  description  of  A.  That  will  work  if  B  has  access  to  the  string  s  that 
A  wrote.  But  it  does.  A  wrote  <B><work>,  which  is  exactly  what  is  on  the  tape  when 
B  gets  control.  So  we  have  (expanding  out  the  code  for  createP ): 

B  = 

1.  /*  Invoke  createP  to  write  onto  tape  2  the  code  that  writes  the  string  that  is 
currently  on  tape  1. 

For  each  character  c  in  s  (on  tape  1)  do  on  tape  2: 

1.1.  Write  c. 

1.2.  Write  R. 

2.  I*  Copy  tape  2  to  tape  1,  moving  right  to  left.  Place  this  copy  to  the  left  of  what 
is  already  on  tape  1. 

Starting  at  the  rightmost  character  c  on  tape  2  and  the  blank  immediately  to  the 
left  of  the  leftmost  character  on  tape  1,  loop  until  all  characters  have  been 
processed: 

2.1.  Copy  c  to  tape  1. 

2.2.  Move  both  read/write  heads  one  square  to  the  left. 

So  the  code  for  B  (unlike  the  code  for  A)  is  independent  of  the  particular  Turing 
machine  of  which  we  need  to  make  a  copy. 

When  B  starts,  the  two  tapes  will  be  as  shown  in  Figure  25.3(a).  After  step  1,  they 
will  be  as  shown  in  Figure  25.3(b).  Remember  that  <A>  is  the  description  of  a 


<B>  <work> 


(a) 


<B>  <work> 


<A> 

(b) 


<A>  <B>  <work> 

f|GURE  25*^ 

'Ttie  tape  before, - 

during,  and  after  <A> 

the  execution 

of 


(C) 
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Turing  machine  that  writes  <B><work>.  Then,  after  step  2.  they  will  be  as  shown 
in  Figure  25.3(c). 

Notice  that  the  code  for  B  is  fixed.  It  first  writes  <A>  onto  tape  2  using  a  simple 
loop.  Then,  starting  from  the  right,  it  copies  <A>  onto  tape  2  just  to  the  left  of  the 
string  <B><work>  that  was  already  there.  Again,  it  does  this  with  a  simple  loop. 

Now  we  can  describe  virus  exactly  as  follows.  Recall  that  <  M  >  means  the  string  de¬ 
scription,  written  in  the  macro  language  described  in  Section  17.6.  of  the  Turing  ma¬ 
chine  M.  So  <B>  is  the  description  of  the  Turing  machine  labeled  B  here: 

yirus( )  = 

A:  Write  on  tape  1  <B>  <work>. 

B\  I*  createP .  which  will  write  onto  tape  2  the  code  that  writes  the  string  that 
is  currently  on  tape  1. 

For  each  character  c  in  s  (on  tape  1 )  do  on  tape  2; 

Write  c. 

Write  R. 

I*  Copy  tape  2  to  tape  1.  moving  right  to  left.  Place  this  copy  to  the  left  of 
what  is  already  on  tape  1. 

Starting  at  the  rightmost  character  c  on  tape  2  and  the  blank  immediately 
to  the  left  of  the  leftmost  character  on  tape  I,  loop  until  all  characters  have 
been  processed: 

Copy  c  to  tape  1. 

Move  both  read/write  heads  one  square  to  the  left. 

work. 

Or.  more  succinctly,  using  P,  and  create P : 

virus  ()  = 

A.'  P<Uxwork>- 
B:  createP. 

Copy  tape  2  to  tape  1. 
work. 

The  construction  that  we  just  did  for  virus  is  not  unique  to  it.  In  fact,  that  construc¬ 
tion  enables  us  to  describe  the  function  obtainSelf ,  which  we  mentioned  at  the  begin¬ 
ning  of  this  section.  Let  M  be  a  Turing  machine  composed  of  two  steps: 

1.  obtainSelf. 

2.  work  (which  may  exploit  the  description  that  obtainSelf  produced ). 

Then  we  can  define  obtainSelf  which  constructs  <M>: 

obtainSelf  work)  = 

B:  createP. 

Copy  tape  2  to  tape  1 

The  Recursion  Theorem,  defined  below,  tells  us  that  any  Turing  machine  can  obtain 
its  own  description  and  then  use  that  description  as  it  sees  fit. There  is  one  issue  that  we 
must  confront  in  showing  that,  however.  Virus  ignored  its  input.  But  many  Turing  ma¬ 
chines  don’t.  So  we  need  a  way  to  write  the  description  of  a  luring  machine  M  onto  its 
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tape  without  destroying  its  input.This  is  easy.  If  M  is  a  fc-tape Turing  machine,  we  build 
a  k  +  2  tape  machine,  where  the  extra  two  tapes  arc  used,  as  we  have  just  described,  to 
create  a  description  of  M. 

THEOREM  25.9  The  Recursion  Theorem 

Theorem:  For  every  Turing  machine  T  that  computes  a  partially  computable  func¬ 
tion  /  of  two  siring  arguments,  there  exists  a  Turing  machine  R  that  computes  a 
partially  computable  function  r  of  one  string  argument  and: 

V.v(r(.v)  =  t(<R>,x)). 

To  understand  the  recursion  theorem,  it  helps  to  see  an  example.  Recall  the  Turing 
machine  that  we  specified  in  our  proof  of  Theorem  21.14  (that  the  language  of 
descriptions  of  minimal  Turing  machines  is  not  in  SD): 

MHx)  = 

1.  Invoke  obniinSelf  to  produce  <M#>. 

2.  Run  ENUM  until  it  generates  the  description  of  some  Turing  machine  M' 
whose  description  is  longer  than  |<A/#>|. 

3.  Invoke  the  universal  Turing  machine  U  on  the  siring  <M'.  x>. 

Steps  2  and  3  are  the  guts  of  iff#  and  correspond  to  a  Turing  machine  T  that  takes  two 
arguments.  <M#>  and  x.and  computes  a  function  we  can  call  t.  M#.on  the  other 
hand,  takes  a  single  argument, x.  But  M#( x)  is  exactly  T(<M#>,  x)  because  in 
step  1.  M#  constructs  <Af#>,  which  it  then  hands  to  T  (i.e.,  steps  2  and  3).  So, 
given  that  we  wish  to  compute  7*(<A/#>,  x),  Mil  is  the  Turing  machine  R  that  the 
recursion  theorem  says  must  exist.  The  only  difference  between  R  and  Tis  that  R 
constructs  its  own  description  and  then  passes  that  description,  along  with  its  own 
argument,  on  to  T.  Since,  for  any  T,  R  must  exist,  it  must  always  be  possible  for  R 
to  construct  its  own  description  and  pass  it  to  T. 

Proof:  The  proof  is  by  construction.  The  construction  is  identical  to  the  one  we 
showed  above  in  our  description  of  virus  except  that  we  substitute  T  for  work. 

rhe  Recuision  TTieorem  is  sometimes  slated  in  a  different  form,  as  a  fixed-point  the¬ 
orem.  We  will  slate  that  version  as  a  separate  theorem  whose  proof  follows  from  the  re¬ 
cursion  theorem  as  just  stated  and  proved. 


THEOREM  25.10  The  Fixed-Point  Definition  of  the  Recursion  Theorem 

Theorem:  Let  f:  { <M>  :  M  is  a  Turing  machine  description}  —*  {<M>  :  M  is  a 
luring  machine  description}  be  any  computable  function  on  the  set  of  Turing 
machine  descriptions. There  exists  some  Turing  machine  Fsuch  that  f(<F>)  is 
the  description  of  some  Turing  machine  G  and  it  is  the  case  that  F  and  G  are 
equivalent  (t.e„  they  behave  identically  on  all  inputs).  We  call  F  a  Fixed  point  of 
the  function  /,  since  it  does  not  change  when  f  is  applied  to  it. 
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Proof:  The  Turing  machine  F  that  we  claim  must  exist  is: 

F(.r)  = 

1.  Invoke  ohtuinSelf  to  produce  <F>. 

2.  Since  f  is  a  computable  function,  there  must  be  some  Turing  machine  Mf 
that  computes  it.  Invoke  M{\<F> ).  which  produces  the  description 
of  some  Turing  machine  we  can  call  G. 

3.  Run  G  on  x. 

Whatever /is,  f(<F>)  -  <G>.  /and  G  are  equivalent  since,  on  any  inputs, 
F halls  exactly  when  G  would  hall  and  it  leaves  on  its  tape  exactly  what  C  leaves. 


This  theorem  says  something  interesting  and.  at  first  glance  perhaps,  counterintu¬ 
itive.  Let's  consider  again  ihe  virus  program  that  wc  described  above.  In  its  work  sec¬ 
tion,  it  changes  one  bit  in  every  file  on  its  host  machine.  Consider  the  files  that 
correspond  to  programs.  Theorem  25.10  says  that  there  exists  at  least  one  program 
whose  behavior  will  not  change  when  it  is  altered  in  that  way.  Of  course,  most  pro¬ 
grams  will  change.  That  is  why  virus  can  he  so  destructive.  But  there  is  not  only  one 
fixed  point  for  virus,  there  arc  many,  including: 

•  Every  program  that  infinite  loops  on  all  inputs  and  where  the  bit  that  /  changes 
comes  after  the  section  of  code  that  went  into  the  loop. 

•  Every  program  that  has  a  chunk  of  redundant  code,  such  as: 

a  =  5 
a  =  7 

where  the  bit  that  gets  changed  is  in  the  first  value  that  is  assigned  and  then 
overwritten. 

•  Every'  program  that  has  a  branch  that  can  never  be  reached  and  where  the  bit  that 
/changes  is  in  the  unreachable  chunk  of  code. 

We  have  stated  and  proved  the  Recursion  Theorem  in  terms  of  the  operation  of  Tur¬ 
ing  machines.  It  can  also  be  stated  and  proved  in  the  language  of  recursive  functions. 
When  done  this  way,  its  proof  relies  on  another  theorem  that  is  interesting  in  its  own 
right.  We  state  and  prove  it  next. To  do  so.  we  need  to  introduce  a  new  technique  for  de¬ 
scribing  functions  since,  so  far.  we  have  described  them  as  strings  (i.e..  the  string  encod¬ 
ings  of  the  Turing  machines  that  compute  them).  Yet  the  theory  of  recursive  functions 
is  a  theory  of  functions  on  the  natural  numbers. 

We  define  the  following  one-to-one  function  Code!  that  maps  from  the  set  of  TUring 
machines  to  the  positive  integers:  Let  M  be  a  Turing  machine  that  computes  some  par¬ 
tially  computable  function.  Let  <M>  be  the  string  description  of  A/,  using  the  encod¬ 
ing  mechanism  that  we  defined  in  Section  17.6.1.  Thai  encoding  scheme  used  eleven 
symbols,  which  can  be  encoded  in  binary  using  four  bits.  Rewrite  <M>  as  a  binary 
string.  Now  view  that  string  as  the  number  it  encodes.  We  note  that  Gikicl  is  a  function 
(since  each  Turing  machine  is  assigned  a  unique  number);  it  is  one-to-one  (since  no  two 
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Turing  machines  are  assigned  the  same  number);  but  it  is  not  onto  (since  there  are 
numbers  that  do  not  encode  any  Turing  machine).  A  one-to-one  function  that  assigns 
natural  numbers  to  objects  is  called  a  Godel  numbering,  since  the  technique  was  intro¬ 
duced  by  Kurt  Gtidel.  It  played  a  key  role  in  the  proof  of  his  Incompleteness  Theorem. 

We’ll  now  create  a  second  Godel  numbering,  this  time  of  the  partial  recursive  func¬ 
tions.  For  each  such  function,  assign  to  it  the  smallest  number  that  has  been  assigned  to 
some  Turing  machine  that  computes  it.  Now  define; 

( pk  to  be  the  partially  computable  function  with  Godel  number  k. 

Notice  that  since  functions  are  now  represented  as  numbers,  it  is  straightforward  to 
talk  about  functions  whose  inputs  and/or  outputs  are  other  functions.  We’ll  take  advan¬ 
tage  of  this  and  describe  our  next  result.  Suppose  that  /(.r,.  x2 . xm,  y\,  y>, . . . ,  y„)  is 

an  arbitrary  function  of  m  +  n  arguments.  Then  we’ll  see  that  it  is  always  possible, 
whenever  we  fix  values  for  jtj,  x2 x,„ ,  to  create  a  new  function  /'  of  only  n  argu¬ 
ments.  The  new  function  /'  will  behave  as  though  it  were  /  with  the  fixed  values  sup¬ 
plied  for  the  first  m  arguments.  One  way  to  think  of  /'  is  that  it  encapsulates / and  a  set 
of  values  w(,  rs,...  »  t’/ll-  We’ll  show  that  there  exists  a  family  of  functions,  one  for  each 
pair  of  values  m  and  n,  that,  given /and  uj,  th . v„„  creates  /'  as  required. 


THEOREM  25.1 1  The  s-m-n  Theorem 

Theorem:  For  all  m.  n  ^  1.  there  exists  a  computable  function  sinjl  with  the  follow¬ 
ing  property:  Let  k  be  the  Godel  number  of  some  partially  computable  function 
of  m  +  n  arguments. Then,  for  all  k.  V\.  ih . vm ,  yj,  . . , ,  y„ : 

•  Snijiik.  V|.  t>2, ....  v,„)  returns  a  number  j  that  is  the  Godel  number  of  some 
partially  computable  function  of  n  arguments,  and 

•  <Py(.V|.  V2 . y„)  =  t'2 . v,„,  y,.  y* . y„). 

Proof:  We  will  prove  the  theorem  by  defining  a  family  of  Ttiring  machines  M„u, 

that  compute  the  smM  family  of  functions.  On  input  (k,  vh  ih . u,M),  Mmjl  will 

construct  a  new  Turing  machine  that  operates  as  follows  on  input  w:  Write 

v2 . v»i  0,1  ,hc  laPe  immediately  to  the  left  of  w,  move  the  read/write  head 

all  the  way  to  the  left  in  front  of  n,;  and  pass  control  to  the  Thring  machine 
encoded  by  k.  Minjt  will  then  return  j,  the  Godel  number  of  the  function 
computed  by  Mr 


The  s-m-n  Theorem  has  important  applications  in  the  design  of  functional 
programming  languages.  (G_5)  In  particular,  it  is  the  basis  for  currying ,  which 
implements  the  process  we  have  just  described.  When  a  function  of  it  >  0  ar¬ 
guments  is  curried,  one  or  more  of  its  arguments  are  fixed  and  a  new  func¬ 
tion,  of  fewer  arguments,  is  constructed. 
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Exercises 

1.  Define  the  function  pred(x)  as  follows: 

pretl :  N  — *  N, 

pred(x)  —  x  -  I. 

a.  Is  pretl  a  total  function  on  N? 

b.  If  not,  is  it  a  total  function  on  some  smaller,  decidable  domain? 

c.  Show  that  pretl  is  computable  by  defining  an  encoding  of  the  elements  of  M  as 
strings  over  some  alphabet  S  and  then  showing  a  luring  machine  that  halts 
on  all  inputs  and  that  computes  either  pretl  or  pretl'  (using  the  notion  of  a 
primed  function  as  described  in  Section  25.1.2). 

2.  Prove  that  every  computable  function  is  also  partially  computable. 

3.  Consider  f:  A— where  A  C  N.  Prove  that,  if /is  partially  computable,  then  A 
is  semidecidable  (i.e., Turing  enumerable). 

4.  Give  an  example,  other  than  steps,  of  a  function  that  is  partially  computable  but 
not  computable. 

5.  Define  the  function  count  L(<M>)  as  follows: 


count  L:  \  <M>  :  M  is  a  Turing  machine}  — *NU  {K„}, 

count L(<M>)  =  the  number  of  input  strings  that  are  accepted  by  M. 


6. 


7. 


8. 

9. 


a.  Is  count L  a  total  function  on  { <M  >  :  M  is  a  Turing  machine }? 

b.  If  not.  is  it  a  total  function  on  some  smaller,  decidable  domain? 

c.  Is  count L  computable,  partially  computable,  or  neither?  Prove  your  answer. 
Give  an  example,  other  than  any  mentioned  in  the  book,  of  a  function  that  is  not 
partially  computable. 

Let  g  be  some  partially  computable  function  that  is  not  computable.  Let  h  be 
some  computable  function  and  let  /(.v)  =  g(h{x)).  Is  it  possible  that  / is  a  com¬ 
putable  function? 

Prove  that  the  busy  beaver  function  £  is  not  computable. 

Prove  that  each  of  the  following  functions  is  primitive  recursive: 

a.  The  function  ilouble(x)  =  2,v. 

b.  The  proper  subtraction  function  monus.  which  is  defined  as  follows: 


monu.\(n.  m) 


if  /»  >  in 
if  n  s  nt 


c.  The  function  half,  which  is  defined  as  follows: 


hutf(n)  = 


in/2 
\</»  - 


I  )/2 


if  n  is  even 
if  n  is  odd 


10.  Let  A  be  Ackermann's  function.  Verify  that  A(4. 1 )  =  65533. 


CHAPTER  26 


Summary  and  References 


One  way  to  think  about  what  we  have  done  in  Part  IV  is  to  explore  the  limits  of  com¬ 
putation.  We  have  considered  many  different  models  of  “the  computable."  All  of  them 
were  described  and  studied  by  people  who  were  trying  to  answer  the  question,  “What 
can  we  compute?"  Some  of  the  models  look  similar.  For  example,  Post  production  systems 
and  unrestricted  grammars  both  define  languages  by  providing  a  start  symbol  and  a  set 
of  production  rules  that  rewrite  one  siring  into  another.  While  there  are  differences 
(Post  systems  exploit  variables  and  must  match  entire  strings  while  unrestricted  gram¬ 
mars  use  only  constants  and  can  match  substrings),  it  turns  out  that  the  two  formalisms 
are  identical:  They  both  define  exactly  the  class  of  languages  that  we  are  calling  SD. 
Similarly,  Turing  machines  and  tag  systems  look  similar.  One  uses  a  tape  with  a 
moveable  read/write  head,  the  other  uses  a  first-in,  first-out  queue.  But  that  difference 
also  turns  out  not  to  matter.  A  machine  of  either  kind  can  be  simulated  by  a  machine  of 
the  other  kind. 

Some  of  the  models  look  very  different.  Turing  machines  seem  like  sequential 
computers.  Expressions  in  the  lambda  calculus  read  like  mathematical  function  defi¬ 
nitions.  Unrestricted  grammars  are  rewrite  systems.  One  of  the  most  important  struc¬ 
tural  differences  is  between  the  models  (such  as  Turing  machines,  tag  systems,  the 
lambda  calculus.  semi-Thue  systems,  and  Markov  algorithms)  that  accept  inputs, 
and  so  compute  functions,  and  those  (such  as  unrestricted  grammars.  Post  systems, 
and  Lindenmayer  systems)  that  include  a  start  symbol  and  so  generate  languages. 
But  all  of  these  systems  can  be  viewed  as  mechanisms  for  defining  languages.  The 
generating  systems  generate  languages;  the  function-computation  systems  compute  a 
language's  characteristic  function.  So  even  that  difference  doesn't  effect  the  bottom 
line  of  what  is  computable. 

Another  thing  that  we  did  in  Part  IV  was  to  introduce  three  new  classes  of  languages: 
D.  SD.  and  the  context-sensitive  languages.  The  table  shown  in  Table  26.1  summarizes 
the  properties  of  those  languages  and  compares  them  to  the  regular  and  the  context- 
free  languages. 
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Table  26.1  Comparing  the  classes  of  languages. 

Regular 

Context-Free 

Con  text-Sensiti  ve 

1) 

ra  || 

Automaton 

FSM 

PDA 

LBA 

TM 

Grammar(s) 

Regular 

expressions 

Context-free 

Context-sensitive 

Unrestricted 

nd  =  n? 

Closed  under 

Yes 

No 

unknown 

Yes 

Concatenation 

Yes 

Yes 
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PART  V 

COMPLEXITY 


In  Part  IV  we  described  the  distinction  between  problems  that  are  theoretical¬ 
ly  solvable  and  ones  that  are  not.  In  this  section,  we  will  take  another  look  at 
the  class  of  solvable  problems  and  further  distinguish  among  them.  In  partic¬ 
ular,  we  will  contrast  problems  that  are  "practically  solvable",  in  the  sense 
that  programs  that  solve  them  have  resource  requirements  (in  terms  of  time 
and  or  space)  that  can  generally  be  met,  and  problems  that  are  "practically 
unsolvable",  at  least  for  large  inputs,  since  their  resource  requirements  grow 
so  quickly  that  they  cannot  typically  be  met.  Throughout  our  discussion,  we 
^jll  generally  assume  that  if  resource  requirements  grow  as  some  polynomial 
function  of  problem  size,  then  the  problem  is  practically  solvable.  If  they  grow 
faster  than  that,  then,  for  all  but  very  small  problem  instances,  the  problem 
yyill  generally  be  practically  unsolvable. 


CHAPTER  27 


Introduction  to  the  Analysis 
of  Complexity 

Once  we  know  that  a  problem  is  solvable  (or  a  language  is  decidable  or  a  func¬ 
tion  is  computable),  we're  not  done.  The  next  step  is  to  find  an  efficient  algo¬ 
rithm  to  solve  it. 

27.1  The  Traveling  Salesman  Problem 

The  traveling  salesman  problem  P  (or  TSP  for  short)  is  easy  to  state:  Given  n  cities  and 
the  distances  between  each  pair  of  them,  find  the  shortest  tour  that  returns  to  its  start¬ 
ing  point  and  visits  each  other  city  exactly  once  along  the  way.  We  can  solve  this  prob¬ 
lem  using  the  straightforward  algorithm  that  first  generates  all  possible  paths  that  meet 
the  requirements  and  then  returns  the  shortest  one.  Since  we  must  make  a  loop  through 
the  cities,  it  doesn’t  matter  what  city  we  start  in.  So  we  can  pick  any  one.  If  there  are  n 
cities,  there  are  n  ~  1  cities  that  could  be  chosen  next.  And  n  -  2  that  can  be  chosen 
after  that.  And  so  forth.  So,  given  n  cities,  the  number  of  different  lours  is  pr  -  1 )!.  We 
can  cut  the  numher  of  lours  we  examine  in  half  by  recognizing  that  the  cost  of  a  tour  is 
the  same  whether  we  traverse  it  forward  or  backward.  That  still  leaves  (n  ~  1)1/2  tours 
to  consider.  So  this  approach  quickly  becomes  intractable  as  the  number  of  cities  grows. 
To  sec  why,  consider  the  following  set  of  observations:  lire  speed  of  light  is  3  •  ll)8  m/sec. 
The  width  of  a  proton  is  K)'1''  m.  So,  if  we  perform  one  operation  in  the  time  it  takes 
light  to  cross  a  proton,  we  can  perform  3*  1(P  operations/scc.  There  have  been  about 
3’  1017  seconds  since  the  Big  Bang.  So.  at  that  rate,  we  could  have  performed  about 
9*  1040  operations  since  the  Big  Bang.  But  36!  is  3.6*  1041.  So  there  hasn't  been  enough 
time  since  the  Big  Bang  to  have  solved  even  a  single  traveling  salesman  problem  with  37 
cities. That's  fewer  than  one  city  per  slate  in  the  United  Slates. 
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One  early  application  of  work  on  the  TSP  was  of  concern  to  farmers  rather 
than  salesmen.  The  task  was  to  conduct  a  survey  of  farmlands  in  Bengal  in 
1938.  One  goal  of  the  survey  planners  was  to  minimize  the  cost  of  transport¬ 
ing  the  surveyors  and  their  equipment  from  one  place  to  the  next.  Another 
early  application  was  the  scheduling  of  school  bus  routes  so  that  all  the  stops 
were  visited  and  the  travel  distance  among  them  was  minimized. 


Of  course,  one  way  to  make  more  computations  possible  is  to  exploit  parallelism. 
For  example,  there  are  about  10* 1  neurons  in  the  human  brain.  If  we  think  of  them  as 
operating  independently,  then  they  can  perform  1011  computations  in  parallel.  Each  of 
them  is  very  slow.  But  if  we  imagined  the  fast  operation  we  described  above  being  per¬ 
formed  by  1011  computers  in  parallel,  then  there  would  have  been  time  for  9*  1051  op¬ 
erations  since  the  Big  Bang.  43!  =  6  •  1052.  So  we  still  could  not  have  solved  an  instance 
of  the  TSP  with  one  city  per  state. 


In  this  century,  manufacturing  applications  of  the  TSP  are  important.  Con¬ 
sider  the  problem  of  drilling  a  set  of  holes  on  a  board. To  minimize  manufac¬ 
turing  time,  it  may  be  important  to  minimize  the  distance  that  must  be 
traveled  by  the  drill  as  it  moves  from  one  hole  to  the  next,  A 


Over  50  years  of  research  on  the  traveling  salesman  problem  have  led  to  techniques 
for  reducing  the  number  of  tours  that  must  be  examined.  For  example,  a  dynamic  pro¬ 
gramming  approach  that  reuses  partial  solutions  leads  to  an  algorithm  that  solves  any 
TSP  instance  with  n  cities  in  time  that  grows  only  as  n2 2".  For  large  n,  that  is  substan¬ 
tially  better  than  ( n  —  l)!.  But  it  still  grows  exponentially  with  n  and  is  not  efficient 
enough  for  large  problems.  Despite  substantial  work  since  the  discovery  of  that  ap¬ 
proach,  there  still  exists  no  algorithm  that  can  be  guaranteed  to  solve  an  arbitrary  in¬ 
stance  of  the!  SP  exactly  and  efficiently.  We  use  the  term  efficiently  here  to  mean  that 
the  time  required  to  execute  the  algorithm  grows  as  no  more  than  some  polynomial 
function  of  the  number  of  cities.  Whether  or  not  such  an  efficient  algorithm  exists  is 
perhaps  the  most  important  open  question  in  theoretical  computer  science.  We'll  have 
a  lot  more  to  say  about  this  question,  which  is  usually  phrased  somewhat  differently: 
“Does  P  =  NPT 

So  we  do  not  have  a  technique  for  solving  the  TSP  that  is  efficient  and  that  is  guar¬ 
anteed  to  lind  the  optimal  solution  for  all  problem  instances.  But  suppose  that  we  can 
compromise. Then  it  turns  out  that: 

1.  there  are  techniques  that  are  guaranteed  to  find  an  optimal  solution  and  that  run 
efficiently  on  many  (although  not  all)  problem  instances,  and 

2.  there  are  techniques  that  are  guaranteed  to  find  a  good  (although  not  necessarily 
optimal )  solution  and  to  do  so  efficiently. 
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TSP  solvers  that  make  the  first  compromise  exploit  the  idea  of  linear  program¬ 
ming  d.  Given  a  problem  P.  a  solver  of  this  sort  begins  by  selling  up  a  relaxed  ver¬ 
sion  of  P  (i.e..  one  in  which  it  is  not  necessary  to  satisfy  all  of  the  constraints 
imposed  by  the  original  problem  PJ.Then  it  uses  the  optimization  techniques  of  lin¬ 
ear  programming  to  solve  this  relaxed  problem  efficiently. The  solution  that  it  finds 
at  this  step  is  optimal,  both  for  the  original  problem  P  and  for  the  relaxed  problem, 
but  it  may  not  be  a  legal  solution  to  P.  If  it  is.  the  process  halls  with  the  best  tour.  If 
the  solution  to  the  relaxed  problem  is  not  also  a  solution  to  P,  it  can  be  used  to 
make  a  “cut"  in  the  space  of  possible  solutions.  Tire  cut  is  a  new  linear  constraint 
with  the  property  that  the  solution  that  was  just  found  and  rejected  is  on  one  side  of 
the  constraint  while  all  possible  solutions  to  the  originul  problem  P  arc  on  the  other. 
Ideally,  of  course,  many  other  candidate  solutions  that  would  also  have  to  be  rejected 
will  also  be  on  the  wrong  side  of  the  cut.  The  cut  is  then  added  and  a  new  linear  pro¬ 
gramming  problem,  again  a  relaxed  (but  this  time  less  relaxed)  version  of  P ,  is 
solved.  This  process  continues  until  it  finds  a  solution  that  meets  the  constraints  of 
the  original  problem  P.  In  the  worst  case,  only  a  single  solution  will  be  eliminated 
every  time  and  an  exponential  number  of  lours  will  have  to  he  considered.  When 
the  data  come  from  real  problems,  however,  it  usually  turns  out  that  the  algorithm 
performs  substantially  better  than  that.  In  1954.  when  this  idea  was  first  described.it 
was  used  to  solve  an  instance  of  the  TSP  with  49  cities.  Since  then,  computers  have 
gotten  faster  and  the  technique  has  been  improved.  In  2004.  the  Concorde  TSP 
solver,  a  modern  implementation  of  this  idea,  was  used  to  find  the  optimal  route 
that  visits  24.978  cities  in  Sweden  Q. 

But  what  about  the  second  compromise?  It  often  doesn’t  make  sense  to  spend 
months  finding  the  perfect  lour  when  a  very  good  one  could  be  found  in  minutes.  Fur¬ 
ther,  if  we're  solving  a  problem  based  on  real  distances,  then  we’ve  already  approxi¬ 
mated  the  problem  by  measuring  the  distances  to  some  finite  precision.  The  notion  of 
an  exact  optimal  solution  is  theoretically  well  defined,  but  it  may  not  be  very  important 
for  real  problems. 

If  we  are  willing  to  accept  a  "good”  solution,  then  there  are  reasonably  efficient  al¬ 
gorithms  for  solving  the  TSP.  For  example,  suppose  that  the  distances  between  the 
cities  satisfy  the  triangle  inequality  (i.e..givcn  any  three  cities  a.h .  and  c,  the  length  of 
the  path  that  goes  directly  from  a  to  b  is  less  than  or  equal  to  the  length  of  the  path 
that  goes  from  a  to  c  and  then  to  />).  If  the  cities  are  laid  out  on  a  plane  and  if  the  dis¬ 
tances  between  them  correspond  to  Euclidean  distance  (i.e..  the  standard  measure  of 
distance  in  the  plane),  then  this  constraint  is  met. Then  there  is  a  polynomial-time  al¬ 
gorithm  that  finds  a  minimum  spanning  tree  (as  described  in  Section  28.1.6)  for  the 
city  graph  and  uses  it  to  construct  a  tour  whose  length  is  no  more  than  twice  the 
length  of  an  optimal  lour.  And  there  is  a  more  sophisticated  algorithm  that  constructs 
a  tour  whose  distance  is  guaranteed  to  be  no  more  than  1.5  limes  that  of  the  optimal 
one.  So,  for  all  such  real-world  problems,  we  have  a  “pretty  givod”  efficient  algorithm. 
But  we’d  like  to  do  better  and  we  usually  can.  For  example,  a  solution  that  is  known  to 
be  no  more  than  0.1  %  longer  than  an  optimal  lour  has  been  found  for  a  problem  with 
1,904.71 1  cities  Q. 

In  several  important  ways,  the  TSP  is  representative  of  a  much  larger  collection  of 
problems  that  are  of  substantial  practical  interest.  As  we  consider  these  problems 
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and  look  for  efficient  algorithms  to  solve  them,  we'll  typically  consider  the  following 
two  important  questions: 

1.  What  do  we  mean  hy  efficiency?  In  particular,  are  we  concerned  with: 

•  the  time  required  to  solve  a  problem,  or 

•  the  space  required  to  solve  it? 

2.  How  intrinsically  hard  is  the  problem?  In  other  words,  is  there  some  reason  to  be¬ 
lieve  that  an  algorithm  that  is  relatively  inefficient,  for  example  one  whose  time 

complexity  grows  exponentially  with  the  size  of  the  input,  is  the  best  we  are  likely 

to  be  able  to  come  up  with  to  solve  the  problem  at  hand? 

In  the  next  three  chapters,  we  will  develop  a  theory  that  helps  us  to  answer  question  2. 
with  respect  to  both  time  and  space  requirements. 

27.2  The  Complexity  Zoo 

We  are  going  to  discover  that,  just  as  we  were  able  to  build  a  hierarchy  of  language  classes 
based  on  the  power  of  the  automaton  required  to  solve  the  membership  problem,  we  can 
build  a  hierarchy  of  problem  classes  based  on  the  complexity  of  the  best  algorithm  that 
could  exist  to  solve  the  problem.  We'll  consider  problems  that  are  intrinsically  “easy”  or 
tractable,  by  which  we  will  mean  that  they  can  be  solved  in  time  that  grows  only  by  some 
polynomial  function  of  the  size  of  the  input.  And  we’ll  consider  problems  (like  the  trav¬ 
eling  salesman  problem)  that  appear  to  be  intrinsically  “hard”  or  intractable,  by  which 
we  mean  that  the  time  required  to  execute  the  best  known  algorithm  grows  exponential¬ 
ly  (or  worse)  in  the  size  of  the  input. 

Some  of  the  complexity  classes  that  we  will  describe  are  large  and  play  important  roles 
in  characterizing  the  practical  solvability  of  the  problems  that  they  contain.  For  example, 
the  first  class  that  we  will  define  is  P.  the  class  of  problems  that  can  be  solved  by  a  determin¬ 
istic  algorithm  in  polynomial  lime.  All  of  the  context-free  languages  (including  the  regular 
ones)  are  in  P.  So  is  deciding  whether  a  number  is  prime  or  whether  a  graph  is  connected. 

We  will  also  describe  a  large  and  important  class  called  NP-complete.  No  efficient 
algorithm  for  solving  any  NP-complete  problem  is  known. The  algorithms  that  we  do 
have  all  require  some  kind  of  nontrivial  search.  For  example,  the  traveling  salesman 
problem  is  NP-complete.  So  is  deciding  whether  a  Boolean  formula  is  satisfiable.  (A 
straightforward  search-based  approach  to  solving  this  problem  simply  tries  all  possible 
assignments  of  truth  values  to  the  variables  of  an  input  formula.) 

For  a  variety  of  reasons,  people  have  found  it  useful  to  define  many  other  classes  of 
problems  as  well.  Some  of  these  classes  are  large  and  include  languages  of  substantial  prac¬ 
tical  interest.  Many  others  are  small  and  contain  problems  of  more  limited  interest.  There 
are  classes  that  are  known  to  be  subclasses  of  other  classes.  There  are  classes  that  are 
known  to  be  mutually  disjoint.  And  there  are  pairs  of  classes  whose  relationship  to  each 
other  is  unknown.  1  he  Complexity  Zoo  O  is  a  catalogue  of  known  complexity  classes.  At 
the  time  that  this  sentence  is  being  written,  it  contains  460  classes,  with  new  ones  still  being 
added.  We  will  mention  only  a  small  fraction  of  them  in  the  next  few  chapters.  But  the  oth¬ 
ers  are  defined  using  the  same  kinds  of  techniques  that  we  will  use.  In  each  case,  the  goal  is 
to  group  together  a  set  of  problems  that  share  some  significant  characleristic(s). 
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27.3  Characterizing  Problems 

In  order  to  he  able  to  compare  very  different  kinds  of  problems,  we  will  need  a  single 
framework  in  which  to  describe  them.  Just  as  we  did  in  Parts  II.  1 1  Land  IV  of  this  book, 
we  will  describe  problems  as  languages  to  be  decided.  So  we  will  prove  complexity  re¬ 
sults  for  some  of  the  languages  we  have  already  discussed,  including: 

•  { iv  e  { a.  b }  *  :  no  two  consecutive  characters  are  the  same }  ( a  typical  regular  language), 

•  { a'b'c*  :  i,  j.k  a  0  and  (i  *  j)  or  (j  *  k) }  (an  example  of  a  context-free  language), 

•  A"B‘'Cn  =  { a"b''c"  : »  >  0}  (an  "easy”  language  that  is  not  context-free),  and 

•  SAT  =  (<ti»>  :  w  is  a  wff  in  Boolean  logic  and  ir  is  satisfiablc }  (a  “hard"  lan¬ 
guage  that  is  not  context-free). 

We  will  describe  both  time  and  space  complexity  in  terms  of  functions  that  are  de¬ 
fined  only  for  deciding  Turing  machines  (i.e., Turing  machines  that  always  halt).  So  our 
discussion  of  the  complexity  of  languages  will  be  restricted  to  the  decidable  languages. 
Thus  we  will  not  be  able  to  make  any  claims  about  the  complexity  of  languages  such  as: 

•  H  =  {<M.  w>  :  Turing  machine  M  halls  on  input  string  «•}.  or 

•  PCP  =  {<P>  :  the  Post  Correspondence  Problem  instance  P  has  u  solution}. 

If  we  were  not  restricting  our  attention  to  decision  problems  (whose  output  is  a  single 
bit),  wc  might  discover  problems  that  appear  hard  simply  because  they  require  very  long  an¬ 
swers.  For  example,  consider  the  Towers  of  Hanoi  problem,  which  we  describe  in  R2.  Sup¬ 
pose  that  we  wanted  to  describe  the  complexity  of  the  most  efficient  algorithm  that. on  input 
#i.  outputs  a  sequence  of  moves  that  would  result  in  ii  disks  being  moved  from  one  pole  to 
another.  It  is  possible  to  prove  that  the  shortest  such  sequence  contains  2"  —  1  moves.  So 
any  algorithm  that  solves  this  problem  must  run  for  at  least  2"  -  I  steps  (assuming  that  it 
lakes  at  least  one  step  to  write  each  move).  And  it  needs  at  least  2"  -  1  memory  cells  to 
store  the  output  sequence  as  it  is  being  built.  Regardless  of  how  efficiently  each  move  can  be 
chosen,  both  the  lime  complexity  and  the  space  complexity  of  any  algorithm  that  solves  this 
problem  must  be  exponential  simply  because  the  length  of  the  required  answer  is. 

Contrast  this  with  the  traveling  salesman  problem.  Given  n  cities,  a  solution  is  an  or¬ 
dered  list  of  the  n  cities.  So  the  length  of  a  solution  is  approximately  the  same  as  the 
length  of  the  input. The  complexity  of  solving  the  problem  arises  not  from  the  need  to 
compose  a  large  answer  but  from  the  apparent  need  to  search  a  large  space  of  possible 
short  answers.  By  choosing  to  cast  all  of  our  problems  as  decision  problems,  we  stan¬ 
dardize  (to  one  bit)  the  length  of  the  solutions  that  will  be  produced. 'Ihen  we  can  com¬ 
pare  problems  by  asking  about  the  complexity,  with  respect  to  lime  or  space  or  both,  of 
computing  that  one  bit.  (We  will  see,  at  the  end  of  the  next  section,  how  the  traveling 
salesman  problem  can  be  converted  to  a  decision  prohlem.) 

27.3.1  Choosing  an  Encoding 

Recall  that  we  argued,  in  Section  3.2,  that  restricting  our  attention  to  the  broad  task  of 
language  recognition  did  not  tie  our  hands  behind  our  backs  since  other  kinds  of 
problems  can  be  encoded  as  languages  to  be  decided.  So.  for  example,  we  will  prove 
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complexity  results  for  some  languages  that  are  derived  from  questions  we  might  ask 
about  graphs.  For  example,  we  can  analyze  the  complexity  of: 

•  CONNECTED  =  { <G>  :  G  is  an  undirected  graph  and  G  is  connected} .  An  undi¬ 
rected  graph  is  connected  iff  there  exists  a  path  from  each  vertex  to  every  other  vertex. 

•  HAM1LTON1AN-CIRCU1T  =  {<G>  :  G  is  an  undirected  graph  and  G  contains 
a  Hamiltonian  circuit}.  A  Hamiltonian  circuit  is  a  path  that  starts  at  some  vertex  s, 
ends  back  in  .v,  and  visits  each  other  vertex  in  G  exactly  once. 

When  our  focus  was  on  decidability,  we  did  not  concern  ourselves  very  much  with 
the  nature  of  the  encodings  that  we  used.  One  exception  to  this  arose  in  Section  3.2, 
when  we  showed  one  encoding  for  an  integer  sum  problem  that  makes  the  resulting 
language  regular,  while  a  different  encoding  results  in  a  nonregular  language. 

But  now  we  want  to  make  claims  not  just  about  decidability  but  about  the  efficiency 
of  decidability.  In  particular,  we  are  going  to  want  to  describe  both  the  time  and  the 
space  requirements  of  a  deciding  program  as  a  function  of  the  length  of  the  program’s 
input.  So  it  may  matter  what  encoding  we  choose  (and  thus  how  long  each  input  string 
is).  Most  of  the  time,  it  will  be  obvious  what  constitutes  a  reasonable  encoding. 

One  important  place  where  it  may  not  be  obvious  is  the  question  of  what  constitutes 
a  reasonable  encoding  of  the  natural  numbers.  We  will  take  as  reasonable  an  encoding 
in  any  base  greater  than  or  equal  to  2.  So  we’ll  allow,  for  example,  both  binary  and  dec¬ 
imal  encodings.  We  will  not  consider  unary  encodings.  The  reason  for  this  distinction  is 
straightforward:  it  takes  n  characters  to  encode  n  in  unary  (letting  the  empty  string 
stand  forO.  1  for  l,and  so  forth).  But  for  any  base  fca2,  the  string  encoding  of  n  base 
b  has  length  [  log/,  n  J  +  1  (where  [xj,  read  as  “floor  ofx”,  is  the  largest  natural  num¬ 
ber  less  than  x).  So  the  length  of  the  encoding  grows  only  as  the  logarithm  of  n.  rather 
than  as  n.  Looked  at  from  the  other  direction,  the  length  of  the  string  required  to  en¬ 
code  n  in  unary  grows  as  2A,  where  k  is  the  length  of  the  string  required  to  encode  n  in 
any  base  fc  a  2. 

As  long  as  we  consider  only  bases  greater  than  1,  the  choice  of  base  changes  the 
length  of  any  number’s  encoding  only  by  some  constant  factor.  This  is  true  since,  for 
any  two  positive  integers  a  and  b: 

log<,x=  log,,  h- log*  x. 

As  we’ll  see  shortly,  we  are  going  to  ignore  constant  factors  in  almost  all  of  our 
analyses.  So,  in  particular,  the  constant  log*  b  will  not  affect  the  analyses  that  we  will 
do.  We’ll  get  the  same  analysis  with  any  base  b  2.  With  this  encoding  decision  in 
hand,  we'll  be  able  to  analyze  the  complexity  of  languages  such  as: 

•  PRIMES  =  {mi:  w  is  the  binary  encoding  of  a  prime  number}. 

But  keep  in  mind  one  consequence  of  this  encoding  commitment:  Consider  any  pro¬ 
gram  P  that  implements  a  function  on  the  natural  numbers.  Suppose  that,  given  the 
number  k  as  input,  P  executes  C(  •  k  steps  (for  some  constant  Cj).  It  might  seem  natural 
to  say  P  executes  in  time  that  is  linear  in  the  size  of  its  input.  But  the  length  of  the  ac¬ 
tual  input  to  P  will  be  log*/c,  where  b  is  greater  than  l.So,if  we  describe  the  number  of 
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steps  P  executes  as  a  function  of  the  length  of  its  input,  we  will  get  is  •  2Ul? *.  Thus  P  ex¬ 
ecutes  in  time  that  grows  exponentially  in  the  length  of  its  input. 

What  uhout  encodings  for  graph  problems  such  as  the  ones  we  mentioned  above?  We 
consider  two  reasonable  encodings  for  a  graph  G  =  ( V\  £).  where  V  is  a  set  of  vertices 
and  E  is  a  set  of  edges.  Let  n  =  |  Vl  (the  number  of  vertices  in  V).  For  both  encodings,  we 
begin  by  naming  the  vertices  with  the  integers  from  1  to  n.Then  we  may: 

•  Represent  G  as  a  list  of  edges. This  is  the  technique  that  we  used  in  Example  3.6.  We 
will  represent  each  vertex  with  the  binary  string  that  encodes  its  name.  We  will  rep¬ 
resent  an  edge  hv  the  pair  of  binary  strings  corresponding  to  the  start  and  the  end 
vertices  of  the  edge.  Then  we  can  represent  G  by  a  sequence  of  edges.  The  binary 
strings  will  be  separated  by  the  character  I.  and  we'll  begin  each  encoding  with  the 
binary  encoding  of  n.Thus  the  string  101/1/10/10/11/1/100/10/101  would  encode 
a  graph  with  five  vertices  and  four  edges.  The  maximum  number  of  edges  in  G  is  tr 
(or  ir/2  if  G  is  undirected).  The  number  of  characters  required  to  encode  a  single 
vertex  is  [  log2  n  J  +  1.  The  number  of  characters  required  to  encode  a  single  edge 
plus  the  delimiter  ahead  of  it  is  then  2  •  (  logj  n  J  ■+■  4.  So  the  maximum  length  of 
the  string  that  encodes  G  is  bounded  by: 

/r(2*log2  n  +  4)  +  log?/». 

•  Represent  G  as  an  adjacency  matrix,  as  described  in  A.3.2.Thc  matrix  will  have  n 
rows  and  n  columns.  The  value  stored  in  cell  (/.  /')  will  be  1  if  G  contains  an  edge 
from  vertex  i  to  vertex  j:  it  will  be  0  otherwise.  So  the  value  of  each  cell  can  be  en¬ 
coded  as  a  single  binary  digit  and  the  entire  matrix  can  be  encoded  as  a  binary 
string  of  length: 

•n 

n\ 

In  either  case,  the  size  of  the  representation  of  G  is  a  polynomial  function  of  the 
number  of  vertices  in  G.The  main  question  that  we  are  going  to  be  asking  about  the 
problems  we  consider  is  whether  or  not  there  exists  an  algorithm  that  solves  the  prob¬ 
lem  in  some  amount  of  time  that  grows  as  no  more  than  some  polynomial  function  of 
the  size  of  the  input.  In  that  case,  the  answer  will  be  the  same  w  hether  we  describe  the 
size  of  G  as  simply  the  number  of  vertices  it  contains  or  we  describe  it  as  the  length  of 
one  of  the  two  siring  encodings  (an  edge  list  or  an  adjacency  matrix)  that  we  just 
described. 


27.3.2  Converting  Optimization  Problems  into  Languages 

But  now  let’s  return  to  the  traveling  salesman  problem.  One  way  to  think  of  the  TSP  is 
that  it  is  the  Hamiltonian  circuit  problem  with  a  twist:  We've  added  distances  (or.  more 
generally)  costs  to  the  edges.  And  we're  no  longer  interested  simply  in  knowing 
whether  a  circuit  exists.  We  insist  on  finding  the  shortest  (or  cheapest)  one.  We  call 
problems  like  TSP.  in  which  wc  must  find  the  •‘best"  solution  (for  some  appropriate 
definition  of  “best** ),  optimization  problems. 
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We  can  convert  an  optimization  problem  into  a  decision  problem  by  placing  a 
bound  on  the  cost  or  any  solution  that  we  will  accept.  So,  for  example,  we  will  be  able 
to  analyze  the  complexity  of  the  language: 

•  TSP-DECIDE  =  {<G,  cost >  :  <G>  encodes  an  undirected  graph  with  a  positive 
distance  attached  to  each  of  its  edges  and  G  contains  a  Hamiltonian  circuit  whose 
total  cost  is  less  than  cost } . 

It  may  feel  that  we  have  lost  something  in  this  transformation.  Suppose  that  what  we 
really  want  to  know  is  how  hard  it  will  be  to  find  the  best  Hamiltonian  circuit  in  a  graph. 
The  modified  form  of  the  problem  that  we  have  described  as  TSP-DECIDE  seems  in 
some  sense  easier,  since  we  need  only  answer  a  yes/no  question  and  we’re  given  a  bound 
above  which  we  need  check  no  paths.  If  we  found  an  efficient  algorithm  that  decided 
TSP-DECIDE,  we  might  still  not  have  an  efficient  way  of  solving  the  original  problem. 
If.  on  the  other  hand,  there  is  no  efficient  procedure  for  deciding  TSP-DECIDE,  then 
there  can  be  no  efficient  procedure  for  solving  the  original  problem  (since  any  such  pro¬ 
cedure  could  be  turned  into  an  efficient  procedure  for  deciding  TSP-DECIDE).  The 
time  required  to  decide  TSP-DECIDE  is  a  lower  bound  on  the  time  required  to  solve 
the  original  problem.  And  what  we're  going  to  see  is  that  no  efficient  procedure  for  decid- 
ingTSP-DECIDE  is  known  and  it  appears  unlikely  that  one  exists. 


27.4  Measuring  Time  and  Space  Complexity 

Before  we  can  begin  to  analyze  problems  to  determine  how  fundamentally  hard  they 
are.  we  need  a  way  to  analyze  the  time  and  space  requirements  of  specific  programs. 


27.4.1  Choosing  a  Model  of  Computation 

If  we  are  going  to  say  that  a  program,  running  on  some  particular  input,  executes  p 
steps  or  uses  m  memory  locutions,  we  need  to  know  what  counts  as  a  step  or  a  memory 
location.  C  onsider,  for  example,  the  following  simple  function  tally ,  which  returns  the 
product  of  the  integers  in  an  input  array: 

tally  (A:  vector  of  n  integers,//:  integer)  = 
result  =  1 . 

For  /  =  1  to  //  do: 

result  =  result  *  /![/]. 
end. 

Return  result. 


Suppose  that  tally  is  invoked  on  an  input  vector  A  with  10  elements.  How  many 
steps  does  it  run  before  it  halls?  One  way  to  answer  the  question  would  be  to  count 
each  line  of  code  once  for  each  lime  it  is  executed.  So  the  initialization  of  result  is  done 
once.  Hie  multiplication  is  done  10  limes.  The  return  statement  is  executed  once.  But 
how  shall  we  count  the  statement  “for  /  =  1  to  //  do:”  and  the  end  statement?  We  could 
count  the  or  statement  10  times  and  thus  capture  the  fact  that  the  index  variable  is  in- 
cremente  an  compai e  to  n  1 0  times. Then  we  could  skip  counting  the  end  statement 
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entirely.  Or  we  could  count  the  end  statement  10  limes  and  assume  that  that's  where 
the  index  variable  is  compared  to  10.  So  we  might  end  up  with  the  answer  22  (i.e., 

1  +■  I  +  10  +  10,  which  we  get  if  we  don't  count  executions  ol  the  end  statement).  Or 

we  might  end  up  with  the  answer  32  (i.e..  1  +  1  +  10  +  10  +  10.  which  we  get  if  we 
count  both  the  end  and  the  for  statement  10  times). 

As  we'll  soon  see.  this  is  a  difference  that  won’t  matter  in  the  kinds  of  analyses  that 
wc  will  want  to  do.  since,  using  either  metric,  we  can  say  that  the  number  of  steps  grows 
linearly  with  the  number  of  elements  in  A.  But  there  is  another  problem  here.  Should 
we  say  that  the  amount  of  lime  required  to  increment  the  index  variable  is  the  same  as 
the  amount  of  time  required  to  multiply  two  (possibly  large)  numbers?  That  doesn't 
seem  to  make  sense.  In  particular,  as  the  number  of  elements  of  A  increases,  the  size  of 
result  increases.  So,  depending  on  how  integers  are  represented,  a  real  computer  may 
require  more  time  per  multiplication  as  the  number  of  elements  of  A  increases.  In  that 
case,  it  would  no  longer  be  true  that  the  number  of  steps  grows  only  linearly  with  the 
length  of  A. 

Now  consider  an  analysis  of  the  space  requirements  of  tally.  One  simple  way  to 
do  such  an  analysis  is  to  say  that,  in  addition  to  the  memory  that  holds  its  inputs, 
tally  requires  two  memory  locations,  one  to  hold  the  index  variable  i  and  another  to 
hold  the  accumulated  product  in  result.  Looked  at  this  way.  the  amount  of  additional 
space  required  by  A  is  a  constant  (i.e..  2).  independent  of  the  si/e  of  its  input.  But  what 
happens  if  we  again  consider  that  the  size  or  result  may  grow  as  each  new  element  of 
A  is  multiplied  into  it.  In  that  case,  the  number  of  bits  required  to  encode  result  may 
also  grow  as  the  number  of  elements  of  A  grows.  Again  the  question  arises.  “Exactly 
what  should  we  count?” 

We  will  solve  both  of  these  problems  by  choosing  one  specific  model  of  compulation: 
the  Turing  machine.  Wc  will  count  execution  steps  in  measuring  time  and  visited  tape 
squares  in  measuring  space.  More  precisely: 

•  We  will  allow  Turing  machines  with  any  fixed  size  tape  alphabet.  Note  that  if  we 
made  a  more  restrictive  assumption  and  allowed  only  two  tape  symbols,  the  num¬ 
ber  of  steps  and  tape  squares  might  increase  but  only  by  some  constant  factor. 

•  Wc  will  allow  only  one-tape  Turing  machines.  Would  it  matter  if  we  relaxed  this  re¬ 
striction  and  allowed  multiple  tapes?  Recall  that  we  showed,  in  Section  17.3.1,  that 
the  number  of  steps  required  to  execute  a  program  on  a  one-tape  machine  grows  as 
at  most  the  square  of  the  number  of  steps  required  to  execute  the  same  program  on 
a  multiple  tape  machine.  So.  if  such  a  factor  doesn't  matter,  we  can  allow  multiple- 
tape  machines  for  convenience. 

•  We  will  consider  both  deterministic  and  nondeterministic Turing  machines.  We  will 
describe  different  complexity  functions  for  the  two  of  them  and  explore  how  they 
relate  to  each  other.  It  seems  likely,  but  no  one  has  yet  succeeding  in  proving,  that 
there  are  problems  that  require  exponentially  more  steps  to  solve  on  a  determinis¬ 
tic  machine  than  they  do  on  a  nondeterministic  one. 

Of  course,  we  rarely  care  about  the  efficiency  of  actual  Turing  machines.  (And  we 
know  that,  even  for  some  simple  problems,  straightforward  Turing  machines  may 
seem  very  inefficient.)  We  care  about  the  efficiency  of  real  computers.  But  we  showed. 
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in  Section  17.4,  that  the  number  of  steps  required  to  simulate  a  simple  but  realistic 
computer  architecture  on  a  one-tape  deterministic  Turing  machine  may  grow  as  at 
most  the  sixth  power  of  the  number  of  steps  required  by  the  realistic  machine.  Almost 
all  of  the  complexity  analyses  that  we  will  do  will  ignore  polynomial  factors.  When  we 
are  doing  that,  we  may  therefore  describe  programs  in  a  more  conventional  program¬ 
ming  style  and  count  steps  in  the  obvious  way. 


27.4.2  Defining  Functions  that  Measure  Time 
and  Space  Requirements 

If  we  are  given  some  particular  Turing  machine  M  and  some  particular  input  w,  then 
we  can  determine  the  exact  number  of  steps  that  M  executes  when  started  with  w  on  its 
tape.  We  can  also  determine  exactly  the  number  of  tape  squares  that  M  visits  in  the 
process.  But  we'd  like  to  be  able  to  describe  M  more  generally  and  ask  how  it  behaves 
on  an  arbitrary  input. 

To  do  that,  we  define  two  functions,  timereq  and  spocereq.  The  domain  of  both  func¬ 
tions  is  the  set  of  Turing  machines  that  halt  on  all  inputs.  The  range  of  both  is  the  set  of 
functions  that  map  from  the  natural  numbers  to  the  natural  numbers.  The  function 
lintereq(M)  measures  the  time  complexity  of  M\  it  will  return  a  function  that  describes 
how  the  number  of  steps  that  M  executes  is  related  to  the  length  of  its  input.  Similarly, 
the  function  spacereq(M)  will  define  the  space  complexity  of  M\  it  will  return  a  func¬ 
tion  that  describes  the  number  of  tape  squares  that  M  visits  as  a  function  of  the  length 
of  its  input. 

Specifically,  we  define  timereq  as  follows: 

•  If  M  is  a  deterministic  Turing  machine  that  halts  on  all  inputs,  then  the  value  of 
timereq(M)  is  the  function  /(«)  defined  so  that,  for  any  natural  number  n,f(n)  is  the 
maximum  number  of  steps  that  M  executes  on  any  input  of  length  n. 

•  If  M  is  a  nondeterministic Turing  machine  all  of  whose  computational  paths  halt  on 
all  inputs,  then  think  of  the  set  of  computations  that  M  might  perform  as  a  tree,  just 
as  we  did  in  Section  17.3.2.  We  will  not  measure  the  number  of  steps  in  the  entire 
tree  of  compulations.  Instead  we  will  consider  just  individual  paths  and  we  will 
measure  the  length  of  the  longest  one.  So  the  value  of  timereq(M)  is  the  function 
J\n)  defined  so  that,  for  any  natural  number  n,/(u)  is  the  number  of  steps  on  the 
longest  path  that  M  executes  on  any  input  of  length  n. 

Analogously,  we  define  spocereq  as  follows: 

•  If  M  is  a  deterministic  Turing  machine  that  halts  on  all  inputs,  then  the  value  of 
spacereq(M)  is  the  function  f(n)  defined  so  that,  for  any  natural  number  «,/(«)  is 
the  maximum  number  of  tape  squares  that  M  reads  on  any  input  of  length  n. 

•  If  M  is  a  nondeterminislicTuring  machine  all  of  whose  computational  paths  halt  on 
all  inputs,  then  the  value  of  spacereq(M)  is  the  function  f(n)  defined  so  that,  for  any 
natural  number  it,  /(»)  is  the  maximum  number  of  tape  squares  that  M  reads  on 
any  path  that  it  executes  on  any  input  of  length  /?. 
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Notice  that  both  limereq(M)  and  space  m/i  M).  as  we  have  just  defined  them,  meas¬ 
ure  the  worst-case  performance  of  M.  In  other  words  they  measure  the  resource  re¬ 
quirements  of  M  on  the  inputs  that  require  the  most  resources.  An  alternative 
approach  would  be  to  define  both  functions  to  return  the  average  over  all  inputs.  Sa 
for  example,  we  might  define  timereqavera){e( M )  to  be  the  function  /'(//)  that  returns 
the  average  number  of  steps  that  M  executes  on  inputs  of  length  n. 

We  have  chosen  to  focus  on  worst -case  performance,  both  because  we  would  like  to 
know  an  upper  bound  on  the  resources  required  to  solve  a  problem  and  because  it  is.  in 
most  cases,  easier  to  determine.  We  should  keep  in  mind,  however,  that  it  is  possible  for 
worst-case  performance  and  average-case  performance  to  be  very  different.  For  exam¬ 
ple,  an  algorithm  that  exploits  a  hash  table  may  take,  on  average,  constant  lime  to  look 
up  a  value.  But,  if  all  the  entries  happen  to  hash  to  the  same  location,  it  may  take  time 
that  is  proportional  to  the  number  of  entries  in  the  table. 


The  fact  that  average-case  and  worst-case  may  be  very  different  can  be 
exploited  by  hackers.  (J.4.2) 


The  good  news  about  the  difference  between  average-case  and  worst-case  is  that, 
for  many  real  problems,  the  worst  case  is  very  rare.  For  example,  in  Chapter  30.  we  will 
describe  the  design  of  randomized  algorithms  that  solve  some  hard  problems  quickly 
with  probability  equal  almost  to  one. 


EXAMPLE  27.1  Analyzing  the  Turing  Machine  that  Decides  AnBnCn 

Consider  the  deterministic  Turing  machine  M  that  we  built  in  Example  17.8.  It 
decides  the  language  AnBnCn  =  { a"b"c" :  //  s  0}  and  it  operates  as  follows: 

1.  Move  right  onto  w.  If  the  first  character  is  □,  hall  and  accept. 

2.  Loop: 

2.1.  Mark  off  an  a  with  a  1. 

2.2.  Move  right  to  the  first  b  and  mark  it  off  with  a  2.  If  there  isn't  one  or  if 
there  is  a  c  first,  hall  and  reject. 

2.3.  Move  right  to  the  first  c  and  mark  it  off  with  a  3.  If  there  isn’t  one  or  if 
there  is  an  a  first,  halt  and  reject. 

2.4.  Move  all  the  way  back  to  the  left,  then  right  again  past  all  the  l’s  (the 
marked  off  a  s).  If  there  is  another  a.  go  back  to  the  top  of  the  loop.  If 
there  isn't. exit  the  loop. 

3.  All  a’s  have  found  matching  b's  and  c's  and  the  read/write  head  is  just  to 

the  right  of  the  region  of  marked  off  a’s.  Continue  moving  left  to  right  to 

verify  that  all  b’s  and  c's  have  been  marked.  If  they  have,  hall  and  accept. 

Otherwise  hall  and  reject. 

We  can  analyze  M  and  determine  limereq( M)  as  follows:  Let  n  be  the  length 
of  the  input  string  w.  First,  since  we  must  determine  the  number  of  steps  that  M 
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executes  in  the  worst  case,  we  will  not  consider  the  cases  in  which  it  exits  the  loop 
in  statement  2  prematurely.  So  we  consider  only  cases  where  there  are  at  least  as 
many  b's  and  c’s  (in  the  right  order)  as  there  are  a’s.  In  all  such  cases,  M  executes 
the  statement-2  loop  once  for  every  a  in  the  input. 

Let’s  continue  by  restricting  our  attention  to  the  case  where  we  AnBnCn. 
Then,  each  time  through  the  loop,  M  must,  on  its  way  to  the  right,  visit  every 
square  that  contains  an  a,  every  square  that  contains  a  b,  and,  on  average,  half  the 
squares  that  contain  a  c.  And  it  must  revisit  them  all  as  it  scans  back  to  the  left. 
Since  each  letter  occurs  n/3  times,  the  average  number  of  steps  executed  each 
lime  through  the  loop  is  2(n/3  +  /i/3  +  n/6).The  loop  will  be  executed  n/3  times, 
so  the  total  number  of  steps  executed  by  the  loop  is  2(n/3)(n/3  +  u/3  +  n/6). 
Then,  in  the  last  execution  of  statement  2.4,  combined  with  the  execution  of 
statement  3.  M  must  make  one  final  sweep  all  the  way  through  in.That  takes  an 
additional  n  steps.  So  the  total  number  of  steps  M  executes  is: 

2(;i/3)(n/3  +  n/3  +  nl 6)  +  n. 

Now  suppose  instead  that  w*  AnBnC"  because  it  contains  either  extra  char¬ 
acters  after  the  matched  regions  or  extra  a’s  or  b’s  embedded  in  the  matching 
regions.  So.  for  example,  w  might  be  aaabbbbbccc  or  aabbcca.  In  these  cases, 
the  number  of  steps  executed  by  the  loop  of  statement  2  is  less  than  the  num¬ 
ber  we  computed  above  (because  the  loop  is  executed  fewer  than  n/3  times). 
Since  timereq  must  measure  the  number  of  steps  in  the  worst  case,  for  any 
input  of  length  ;i,  we  can  therefore  ignore  inputs  such  as  these  in  our  analysis. 
So  we  can  say  that: 

timercq(M)  =  2(n/3)(n/3  +  n/3  +  n/6)  +  n. 

Using  ideas  that  we  will  formalize  shortly,  we  can  thus  say  that  the  time  re¬ 
quired  to  run  M  on  an  input  of  length  n  grows  as  ;r. 

Analyzing  spacereq(M)  is  simpler.  M  uses  only  those  tape  squares  that  contain  its 
input  string,  plus  the  blank  on  either  side  of  it.  So  we  have: 

spucereq(M)  =  n  +  2. 


.5  Growth  Rates  of  Functions 

Let  A  be  a  program  and  suppose  that  limereq(A)  =  2 n.  By  almost  any  standard.  A  is 
efficient.  We  can  probably  afford  to  run  A  on  any  inputs  that  anyone  can  afford  to  con¬ 
struct.  But  now  consider  a  program  B.  where  tbnereq(  B)  =  2".  this  second  program  is 
a  lot  less  efficient  than  A  is.  And  there  are  inputs  of  quite  reasonable  size  on  which  B 
would  not  yet  have  finished  if  it  had  started  at  the  instant  of  the  Big  Bang.  Some  func¬ 
tions  grow  very  much  faster  than  others,  as  shown  in  Figure  27  1  (in  which  both  the 
jr-axis,  corresponding  to  and  the  y-axis,  corresponding  /(n).  are  logarithmic). 

As  we  eve  op  a  t  eory  of  complexity,  we  will  Find  that  problems  that  can  be  solved  bv 
algorithms  whose  time  requirement  is  some  polynomial  function  (e.g.,2„)  will  generally 
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bo  regarded  as  tractable.  Problems  Tor  which  the  best  known  algorithm  has  greater  than 
polynomial  time  complexity  (c.  g..  2")  will  generally  be  regarded  as  intractable. 

Problems  that  are  intractable  in  this  sense  are  likely  to  remain  intractable,  even  as 
computers  get  faster.  For  example,  if  computer  speed  increases  by  a  factor  of  10.  we  can 
think  of  limereq  as  decreasing  by  a  factor  of  10.  The  only  effect  that  has  on  the  growth 
rale  chart  that  we  just  presented  is  to  shift  all  the  lines  down  a  barely  perceptible  amount. 

It  is  possible  that  the  one  thing  that  might  change  the  intractability  picture  for  some 
problems  is  quantum  computing  a.  So  fur,  the  only  quantum  computers  that  have 
been  built  are  so  small  that  quantum  computing  has  not  had  a  practical  impact  on  the 
solvability  of  hard  problems.  Someday,  however,  they  might.  But  it  is  important  to  keep 
in  mind  that  while  quantum  computing  may  break  through  intractability  barriers,  it 
cannot  break  through  computability  ones.  Hie  proof  that  we  did  of  the  unsolvability  of 
the  halting  problem  made  no  appeal  to  the  physical  structure  of  the  device  that  was  hy¬ 
pothesized  to  implement  the  halts  function.  So  it  applies  to  quantum  computers  as  well 
as  to  current  silicon-based  ones. 


27.6  Asymptotic  Dominance 

As  we  analyze  problems  and  the  algorithms  that  can  solve  them,  wo  may  be  interested 
in  one  or  both  of: 

•  The  exact  amount  of  lime  or  space  required  to  run  an  algorithm  tin  a  problem  of  a 
given  size.  In  this  case,  we  may  care  that  one  algorithm  runs  twice  as  last  as  another 
one,  or  that  it  uses  half  as  much  memory.  When  this  happens,  the  functions 
limereq(M)  and  spacereq(M)  are  exactly  w  hat  we  need. 
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•  The  rate  at  which  the  required  time  or  space  grows  as  the  size  of  the  problem 
grows.  In  this  case,  we  may  be  relatively  unconcerned  with  such  things  as  constant 
factors,  particularly  if  we  are  facing  that  the  total  required  time  or  space  grows  ex¬ 
ponentially  (or  worse)  with  the  size  of  the  problem.  In  this  case,  timereq(M)  and 
spuccreq(M)  provide  detail  that  may  obscure  the  important  factors. 

In  the  analyses  that  we  will  do  in  the  next  two  chapters,  we  will  focus  on  the  second 
of  these  issues.  Thus  we  will,  by  and  large,  ignore  constant  factors  and  slowly  growing 
terms.  So  for  example,  if  timereq{Mx)  —  3n2  4-  23n  4-  100  and  timereq(Mz)  =  25/r 
+  4/i  4-  3,  we  would  like  to  say  that  the  time  complexity  of  both  machines  grows  as  /r. 
But  before  we  embark  on  that  analysis,  we  should  point  out  that,  when  we  are  consider¬ 
ing  practical  algorithms,  constant  factors  and  more  slowly  growing  terms  may  matter.  For 
instance,  in  Exercise  27.S,  we  will  compare  two  algorithms  for  matrix  multiplication.  To 
multiply  two  n  x  n  matrices  using  the  obvious  algorithm  requires  time  that  grows  as  n3. 
An  alternative  is  Slrassen's  algorithm.  We'll  see  that  it  requires  time  that  grows  as  n~ml. 
But  we’ll  also  see  that  Slrassen’s  algorithm  can  be  slower  than  the  straightforward  ap¬ 
proach  until  n  crosses  a  threshold  that  typically  occurs  between  500  and  1000. 

To  be  able  to  do  the  kind  of  analysis  that  we  wish  to  focus  on,  we’ll  need  to  be 
able  to  compare  two  functions  and  ask  how  they  behave  as  their  inputs  grow.  For 
example,  does  one  of  them  grow  faster  than  the  other?  In  other  words,  after  some 
finite  set  of  small  cases,  is  one  of  them  consistently  larger  than  the  other?  In  that 
case,  we  can  view  the  larger  function  as  describing  an  upper  bound  on  the  smaller 
one.  Or  perhaps  they  grow  at  the  same  rate.  In  that  case,  we  can  view  either  as  de¬ 
scribing  a  bound  on  the  growth  of  the  other.  Or  perhaps,  after  some  finite  number  of 
small  cases,  one  of  them  is  consistently  smaller  than  the  other.  In  that  case,  we  can 
view  the  smaller  one  as  describing  a  lower  bound  on  the  other.  Or  maybe  we  can 
make  no  consistent  claim  about  the  relationship  between  the  two  functions. The  the¬ 
ory  that  we  are  about  to  present  is  a  general  one  that  relates  functions  to  each  other. 
It  is  not  tied  specifically  to  our  use.  namely  to  measure  the  performance  of  pro¬ 
grams.  But  it  is  exactly  what  we  need. 


One  reason  that  we  generally  choose  to  ignore  constant  factors,  in  particular, 
is  that  the  Linear  Speedup  Theorem  tells  us  that,  up  to  a  point,  any  TUring 
machine  can  be  sped  up  by  any  desired  constant  factor.  (F.2) 


Consider  any  two  functions /and  g  from  the  natural  numbers  to  the  positive  reals. 
We  define  five  useful  relations  that  may  hold  between  such  functions. The  first,  O,  is  in¬ 
troduced  in  A.7.2.  We  have  been  using  it  informally  in  the  analyses  of  the  algorithms 
that  we  have  presented  in  Parts  11.  III.  and  IV. The  other  four  are  new: 

•  Asymptotic  upper  bound:  f{n)  e  0(g(n))  iff  there  exists  a  positive  integer  k  and  a 
positive  constant  c  such  that: 


V/i  ^  k(f(n)  ^cg(n)). 
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In  other  words,  ignoring  some  number  of  small  cases  (all  those  of  size  less  than  k), 
and  ignoring  some  constant  factor  c,  f(n)  is  bounded  from  above  by  g(n).  Another 
way  to  describe  this  relationship,  if  the  required  limit  exists,  is: 


lim 

«— x 


/(*) 

£(") 


<  oo. 


1  n  this  case,  we’ll  say  that  /is  “big-Oh"  of  g  or  that  g  asymptotically  dominates  or 
grows  at  least  as  fast  as  /.  We  can  think  of  g  as  describing  an  upper  bound  on  the 
growth  of  /. 

•  Asymptotic  strong  upper  bound:  f(n)  e  a(g{n))  iff.  for  every  positive  c.  there  ex¬ 
ists  a  positive  integer  k  such  that: 

Vn  2  k(f{n)  <  c  g(n)). 

In  other  words,  whenever  the  required  limit  exists: 


lim 
>»—  *■ 


/(») 

«('») 


=  0. 


In  this  case,  well  say  that  /is  “litlle-oh”  of  g  or  that  g  grows  strictly  faster  than/does. 

•  Asymptotic  lower  bound:  f(n)  e  fl(g(/i))  iff  there  exists  a  positive  integer  k  and 
a  positive  constant  c  such  that: 

Vn  2  k(f(n)  2  c  g{n)). 

In  other  words,  ignoring  some  number  of  small  cases  (all  those  of  size  less  than  k), 
and  ignoring  some  constant  factor  c,J{n)  is  bounded  from  below  by  g(n).  Another 
way  to  describe  this  relationship,  if  the  required  limit  exists,  is: 


lim  >  0. 

/,-*  g(n) 

In  this  case,  we’ll  say  that  /is  “big-Omega”  of  g  or  that  #  grows  no  faster  than/. 

•  Asymptotic  strong  lower  bound:  f(n)  e  u>(g{n))  iff.  for  every  positive  c.  there  ex¬ 
ists  a  positive  integer  k  such  that: 

Vn  2  *(/(«)  >  c  g(n)). 


In  other  words,  whenever  the  required  limit  exists: 


lim 


/<») 


gin) 


=  so. 


In  this  case,  we’ll  say  that  /  is  “little-omega”  of  g  or  that  g  grows  strictly  slower 
than  /  does. 
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k  k  k 

f(n)  t  0(g{n))  /(n)eft(g(n))  /(«)  €  0(g(n» 


FIGURE  27.2  0,n,and0. 


•  Asymptotic  tight  bound :  f(n)  e  0(g(n))  iff  there  exists  a  positive  integer  k  and 
positive  constants  C|,  and  c2  such  that: 

Vn  >  k(ci  g(n)  s  f(n)  ^  c2g(n)). 

In  other  words,  again  assuming  the  limit  exists: 

0  <  lim  <  oo. 

«-°°g(n) 

In  this  case,  we’ll  say  that  f  is  “Theta”  of  g  or  that  g  is  an  asymptotically  tight 
bound  on  f.  Equivalently,  we  can  define  0  in  terms  of  O  and  fl  in  either  of  the  fol¬ 
lowing  ways: 

•  /(w)e0(g(n))  iff  /(n)eO(g(n))  and  /(n)e  fl(g(w)).  In  other  words,  f(n)e 
0(g(n))  g(n)  is  both  an  upper  and  a  lower  bound  of  f{n). 

•  /(n)efl(g(n))  iff  /(n)eO(g(n))  and  g(«)eO(/(n)).  In  other  words,  f(n)e 
©(«(«))  iff  /C«)  and  g(n)  are  upper  bounds  of  each  other. 

The  graphs  shown  in  Figure  27.2  may  help  in  visualizing  the  bounds  that  are  defined  by 
O,  ft,  and0. 


EXAMPLE  27.2  Determining  0,  <r,  fl,  and  0  from  the  Definitions 

Suppose  that  we  have  analyzed  the  time  complexity  of  some  Tbring  machine  M 
and  determined  that: 


timereq(M)  =  3n2  +  23  n  +  100. 

Then: 

•  timereq(M)  e  C?(n2),  which  we  can  prove  by  finding  appropriate  values  for  c 
and  k.  A  bit  of  experimenting  will  show  that  we  could,  for  example,  let  c  =  4 
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EXAMPLE  27.2  ( Continued ) 

and  k  =  28  since  V/?  2  28  (3 n2  4  23/?  -I-  100  <  4/r).  A  direct  way  to  find  c  and  k 

in  the  case  of  polynomials  like  this  is  to  observe  that,  if  n  2  1.  then: 

3 w2  4  23/i  +  100  <  3/r  4  23/i2  4  100/r2  =  12 6/?\ 

So  let  k  =  1  and  c  -  126. 

•  limcreq(M)  e  0(n 3),  which  we  can  prove  by  using  either  of  the  sets  of  values 
for  k  and  c  that  we  used  above. 

•  iimereq(M)E<r(n 3),  which  we  can  prove  by  letting,  for  any  value  of  c,  k  be 
f  126/c  1  +  1  (where  f  x  ] .  read  as  "ceiling  of  .r",  is  the  smallest  integer  greater 
than  or  equal  to  .t).To  see  why  such  a  k  works,  again  observe  that,  if  nil  then: 

3 n2  4-  23/?  +  100  <=  3/r  -I-  23/r  4  UK)/?2  =  126/r. 

So  we  can  assure  that  3/r  4  23/?  4  100  <  c  n3  by  assuring  that  126/?2  <  c  n3. 
Solving  for  /?.  we  get/?  >  126/c. 

NVc  can  guarantee  that  k  is  an  integer  and  that  it  is  greater  than  126/c  by  set¬ 
ting  it  to  f  126/cl  +  1.  Note  that  this  means  that  k  >  1,  so  the  condition  we  re¬ 
quired  for  the  first  step  we  did  is  satisfied. 

•  iimereq(M)e  fl(/i),  which  we  can  prove  by  letting  c  =  1  and  k  =  1,  since 
V/?  2:  1  (3/?2  4  23/1  4  100  2:  /]). 

•  timereq(M)e  ll(/?:),  which  we  can  prove  by  letting  c  =  1  and  k  =  1,  since 
V/?  >  1  (3/72  4  23/i  4  100  2  /?2). 

•  limereq(M)  e  0(/?2),  which  we  can  prove  by  noting  that  3/i2  4  23/?  4  100 
eO(ir)  and  3/?2  4  23/i  4  100efl(/i2).  Note  that  timereq{M)eQ(n)  and 
timereq{M)  $  6 (/?*). 


Given  two  functions /(/?)  and  #(/?).  it  is  possible  to  show  that  f(n)eO(f>(n))  (and 
similarly  for  <r,  fl.  and  0)  by  showing  the  required  constants,  as  we  have  just  done. 
But  it  is  much  easier  to  prove  such  claims  by  exploiting  a  set  of  facts  about  arith¬ 
metic  with  respect  to  these  relations.  We’ll  stale  some  of  these  facts  in  the  next  two 
theorems. 


THEOREM  27.1  Facts  About  O 

Theorem:  Let  g.  gj.  and  &  be  functions  from  the  natural  numbers  to  the 

positive  reals,  let  a  and  b  be  arbitrary  real  constants,  and  let  c.  c,h  c,. . . . c*.  s,tbe 
any  positive  real  constants.  Then: 


1.  /(/?)  e  0( /■(/?)). 
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2.  Addition: 

2.1.  0(f(n))  =  0{f(n)  +  c0)  (if  we  make  the  assumption,  which  will  al¬ 
ways  he  true  for  the  functions  we  will  be  considering,  that  1  e  0(/(«)))- 

2.2.  If  /|(/i)e0(g|(i»))  and  /2(/i)e0(g2(n))  then  /,(>»)  +  f2(n)^0(g](n) 

+  «:(«))• 

2.3.  0(/,(/i)  +  />(/0)  =  0(#mw(/|(»)./2(rj))). 

3.  Multiplication: 

3.1.  0(/(«))  =  C?(cuf(/i)). 

3.2.  If  /i(/t)eC^(g|(n))  and  /2(/j)  eO(j>2(/i))  then  e  C?(^l(/j) 

#,':("))• 

4.  Polynomials: 

4.1.  Ifa  s /MhenO(/f“)CO(//). 

4.2.  If  f(n)  =  cjnf  +  cy-_  j#iy ~ 1  +...c,n  +  Cuthen/(;i)eC>(nJ'). 

5.  Logarithms: 

5.1.  For  a  and  h  >  1. 0(log„  n)  -  0( log/,  n). 

5.2.  If  0  <  a  <  b  and  c  >  1  then  (9(/i")  C  0(nu  log,,  n)  C  0(A 

6.  Exponentials  (including  the  fact  that  exponentials  dominate  polynomials): 

6.1.  If  t  <  a  =£  b  then  0(a")  CO(b"). 

6.2.  If  a  ^  0  and  h  >  1  then  0(/i") QO(bn). 

6.3.  If /(«)  =  c;  +  |2fl  +  Cjii'  +  Cj-\n’~l  +  ...lyi  +  c(),  then /(/j) eO(2"). 

6.4.  ir.v  >  1  then  0(n'2")  C C?(2lw  *). 

7.  Factorial  dominates  exponentials:  If  a  s  1  then  0(a")  C  0(«!). 

8.  Transitivity:  If  f(n)  e0(/i(/i))  and  /,(/?)  eO(/2(/i))  then  /(n)  e  0(/2(h)). 

Proof:  Proofs  of  these  claims,  based  on  the  definition  of  O,  are  given  in  F.l  or  left 
as  exercises. 

We  can  summarize  some  of  the  key  facts  from  Theorem  27.1  as  follows,  with  the 
caveat  that  the  constants  a,  h.c .  and  d  must  satisfy  the  constraints  given  in  the  theorem: 

0(c)  C  OOog,,  n)  C 0(nh)  C  0(rf")  C  0(/i!). 

In  other  words,  factorial  dominates  exponentials,  which  dominate  polynomials, 
which  dominate  logarithms,  which  dominate  constants. 

THEOREM  27.2  Facts  About  cr 

Theorem:  Given  any  functions  f  and  g  from  the  natural  numbers  to  the  positive 
reals: 

•  tr(f(n))CO(J(n )) 

Proof:  Proofs  of  these  claims,  based  on  the  definitions  of  O  and  cr.  are  given  in  F.l. 
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EXAMPLE  27.3  Determining  O  and  a  from  the  Properties  Theorems 

In  Example  27.1,  we  analyzed  the  time  complexity  of  the  Ttiring  machine  M  and 
determined  that: 

timereq(M)  =  2(»/3)(«/3  +  nl3  +  nib)  +  n. 

=  (5/9  )/r  +  n. 

So: 

•  timereq(M)  eO(n2).  It  is  also  true  that  limereq(M)  e  0(n}). 

•  timereq(M )  e  o-(/t3). 


We’ve  defined  the  relations  O,  </,  fl,  and  (■)  because  each  of  them  is  useful  in  charac¬ 
terizing  the  way  in  which  thnereq(M)  and  spacereqi M )  grow  as  the  length  of  the  input 
to  M  increases.  H  (/(«))  provides  the  most  information  since  it  describes  the  tightest 
bound  on  the  growth  of  J\n).  Blit  most  discussions  of  complexity  rely  more  extensively 
on  O  for  two  reasons: 

•  Even  when  analyzing  a  particular  machine  M.  it  may  be  easier  to  prove  a  claim  about 

0{timereq( M))  than  about  M))  (and  similarly  about  spucereq{M)).  In  this 

case,  it  is  conventional  to  make  the  strongest  claim  that  can  be  proved.  So.  for  example, 
if  timereq(M)  e  C(ny)  then  it  must  also  be  true  that  limcrnpM )  e  (7(/i4).  But  if  we 
can  prove  the  former  claim,  then  that  is  the  one  we  will  make. This  is  the  convention 
that  we  have  used  in  analyzing  algorithms  in  Parts  II,  III. and  IV  of  this  book. 

•  In  Chapters  2K,  29,  und  30.  we  will  move  from  discussing  individual  algorithms  for 
deciding  a  language  to  making  claims  ahoul  the  inherent  complexity  of  a  language 
itself.  We'll  base  those  claims  on  the  best  known  algorithm  for  deciding  the  lan¬ 
guage.  Since  we  often  cannot  prove  that  no  better  algorithm  can  exist,  we  will  be 
unable  to  make  any  claim  about  a  lower  bound  on  the  complexity  of  the  language. 
Thus  O  will  be  the  best  that  we  can  do. 

It  is  common  to  say,  informally.  “ M  is  0(J'(n))."  when  we  mean  that  timereq 
(M)eO(f(n)).  We  will  do  this  when  it  causes  no  confusion.  Similarly,  we’ll  say  that 
M  is  polynomial  or  that  M  implements  a  polynomial-time  algorithm  whenever 
timereq(M)  e.O()'{n))  for  some  polynomial  function  J. 


27.7  Algorithmic  Gaps 

Our  goal,  in  the  next  three  chapters,  is  to  characterize  problems  by  their  inherent  diffi¬ 
culty.  We  can  close  the  book  on  the  complexity  of  a  problem  /.  if  we  can  show  all  of  the 
following: 

1.  There  exists  an  algorithm  that  decides  L  and  that  has  complexity  C',. 

2.  Any  algorithm  that  decides  L  must  have  complexity  at  least  C\. 

3.  C,  =  C2. 
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The  existence  of  an  algorithm  as  described  in  point  1  imposes  an  upper  bound  on 
the  inherent  complexity  of  L  since  it  tells  us  that  we  can  achieve  C,.  The  existence  of  a 
proof  of  a  claim  as  described  in  point  2  imposes  a  lower  bound  on  the  inherent  com¬ 
plexity  of  L  since  it  tells  us  that  we  can't  do  better  than  C2.  If  C,  —  C2,  we  are  done. 

What  we  arc  about  to  see  is  that,  for  many  interesting  problems,  we  are  not  done. 
For  all  of  the  problems  we  will  consider,  some  algorithm  is  known.  So  we  have  an 
upper  bound  on  inherent  complexity.  But.  for  many  of  these  problems,  only  very  weak 
lower  bounds  arc  known.  Proving  lower  bounds  turns  out  to  be  a  lot  harder  than  prov¬ 
ing  upper  bounds.  So,  for  many  problems,  there  is  a  gap.  and  sometimes  a  very  signifi¬ 
cant  one,  between  the  best  known  lower  bound  and  the  best  known  upper  bound.  For 
example,  the  best  known  deterministic  algorithm  for  solving  the  traveling  salesman 
problem  exactly  has  limereq  e  0(2("  ’).  But  it  is  unknown  whether  this  is  the  best  we 
can  do.  In  particular,  no  one  has  been  able  to  prove  that  there  could  not  exist  a  deter¬ 
ministic,  polynomial  time  algorithm  forTSP-DECIDE. 

The  complexity  classes  that  we  are  about  to  define  will  necessarily  be  based  on  the 
facts  that  we  have.  Tlius  they  will  primarily  be  defined  in  terms  of  upper  bounds.  We 
will  group  together  problems  for  which  algorithms  of  similar  complexity  are  known. 
We  must  remain  agnostic,  for  now,  on  several  questions  of  the  form, “Is  class  CL,  equal 
to  class  C LiT  Such  questions  will  only  be  able  to  be  answered  by  the  discovery  of  new 
algorithms  that  prove  stronger  upper  bounds  or  by  the  discovery  of  new  proofs  of 
stronger  lower  bounds. 


27.8  Examples* 

Suppose  that  we  have  a  problem  that  we  wish  to  solve  and  an  algorithm  that  solves  it. 
But  we'd  like  a  more  efficient  one.  We  might  be  happy  with  one  that  runs,  say,  twice  as 
fast  as  the  original  one  does.  But  we  would  be  even  happier  if  we  could  find  one  for 
which  the  required  time  grew  more  slowly  as  the  size  of  the  problem  increased.  For  ex¬ 
ample,  the  original  algorithm  might  be  0(2"),  while  another  one  might  be  C(n3). 
Sometimes  we  will  succeed  in  finding  such  an  algorithm.  As  we'll  see  in  the  next  couple 
of  chapters,  sometimes  we  won't. 

27.8.1  Polynomial  Speedup 

We  begin  with  two  examples  for  which  we  start  with  a  polynomial  algorithm  bul  are 
nevertheless  able  to  improve  its  running  time. 


EXAMPLE  27.4  Finding  the  Minimum  and  Maximum  in  a  List 

We  first  consider  an  easy  problem:  Given  a  list  of  n  numbers,  find  the  minimum 
and  the  maximum  elements  in  the  list.  We  can  convert  this  problem  into  a  lan¬ 
guage  recognition  problem  by  defining  the  language  L  =  {<list  of  numbers, 

number^  number2>  :  number ,  is  the  minimum  element  of  the  list  and  number 7  is 
the  maximum  element}. 
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EXAMPLE  27.4  ( Continued ) 

We'll  focus  on  the  core  of  the  decision  procedure.  Its  job  is  to  examine  a  list  and 
find  its  minimum  and  maximum  elements.  We  begin  with  a  simple  approach: 

simplecompure(lisi :  list  of  numbers)  = 
max  =  list[  1  ]. 
min  =  list\  1 1. 

For  i  =  2  to  length  (list)  do: 

If  //'.v/(/']  <  min  then  min  =  lisi[i\. 

If  list[i\  >  max  then  max  =  //srj/]. 

Rather  than  trying  to  count  every  operation,  we'll  assume  that  the  lime  re¬ 
quired  by  all  the  other  operations  is  dominated  by  the  lime  required  to  do  the 
comparisons.  The  straightforward  algorithm  that  we  just  presented  requires 
2(/r  -  1)  comparisons.  So  we  can  say  that  simplecnmpare  is  c7(2/i).  Or,  eliminat¬ 
ing  the  constant,  it  is  0{n).  Can  we  do  better?  We  notice  that  if  list[i\  <  min  then 
it  cannot  also  be  true  that  lisi\i]  >  max.  So  that  comparison  can  be  skipped.  We 
can  do  even  better,  though,  if  we  consider  the  elements  of  the  list  two  at  a  time. 
Wc  first  compare  list\i]  to  lisi[i  +  1J.  Then  we  compare  the  smaller  of  the  two  to 
min  and  the  larger  of  the  two  to  max.  This  new  algorithm  requires  only 
(3/2)(/»  -  1 )  comparisons.  So,  while  the  time  complexity  of  all  three  algorithms  is 
O(n).  the  last  one  requires  25%  fewer  comparisons  than  the  first  one  did. 


In  the  next  example  we  return  to  a  problem  we  considered  in  Chapter  5:  Given  a 
pattern  string  and  an  input  text  siring,  does  the  pattern  match  anywhere  in  the  text? 
We  know  that  this  question  is  decidable  and  that  one  way  to  answer  it  is  to  use  a  finite 
state  machine.  Wc  now  consider  another  way  and  examine  its  efficiency. 


EXAMPLE  27.5  String  Search  and  the  Knuth-Morris-Pratt  Algorithm 

Define  the  language: 

•  STRING-SEARCH  =  {</,/>>:  the  string  p  (the  pattern)  exists  as  a  sub¬ 
string  somewhere  in  t  (the  text  string)). 

The  following  straightforward  algorithm  decides  STRING-SEARCH  by  look¬ 
ing  for  at  least  one  occurrence  of  the  pattern  p  somewhere  in  r.  It  starts  at  the  left 
and  shifts  p  one  character  to  the  right  each  time  it  fails  to  find  a  match.  (Note  that 
the  characters  in  the  strings  are  numbered  starling  with  It.) 

simple-string-search (/.  p:  strings)  = 

I  =  0. 

7  =  0. 

While  i  ^  lr]  -  I/j|  do: 
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While  j  <  \p\  do: 

If  /[/  +  ;']  =  p\j]  then;  =  /  +  1. 

Else  exit  this  loop. 


If;  =  \p\  then  halt  and  accept. 
Else: 

i  =  i  +  1. 

;  =  0. 


Halt  and  reject. 


I*  Continue  the  match 

/*  Match  failed.  Need 
to  slide  the  pattern 
to  the  right. 

I*  The  entire  pattern 
matched. 

/*  Slide  the  pattern  one 
character  to  the  right. 

f*  Start  over  again 
matching  pattern 
characters. 

I*  Checked  all  the  way 
to  the  end  and  didn't 
find  a  match. 


Let  n  be  \t\  and  let  m  be  |p|.  In  the  worst  case  (in  which  it  doesn’t  find  an 
early  match),  simple-string-search  will  go  through  its  outer  loop  almost  n  times 
and,  for  each  of  those  iterations,  it  will  go  through  its  inner  loop  m  times.  So 
tiniereq  (simple-string-search)  e  O(nm). 

Can  we  do  better?  The  answer  is  yes.  We  know,  from  Section  5.4.2,  that,  given  a 
particular  pattern  p,  we  can  build  a  deterministic  finite  state  machine  that  looks 
for  p  in  t  and  executes  only  n  steps.  But  constructing  that  machine  by  hand  for 
each  new  p  isn’t  feasible  if  the  pattern  itself  must  also  be  an  input  to  the  program. 
We  could  use  the  following  algorithm  to  decide  STRING-SEARCH  (where  both 
t  and  p  are  input  to  the  program): 

string-search-using- FSMs(t,p:  strings)  = 

1.  Build  the  simple  nondeterministic  FSM  M  that  accepts  any  string  that 
contains  p  as  a  substring. 

2.  Let  Af  =  ndfsmtodfsm(M).  I*  Make  an  equivalent  deterministic  FSM. 

3.  Let  M"  =  minDFSM(M').  /*  Minimize  it. 

4.  Run  Mn  on  /. 

5.  If  it  accepts,  accept.  Else  reject. 

Step  4  of  string-search-using-FSMs  runs  in  n  steps.  And  it  is  true  that  steps  1 
through  3  need  only  be  done  once  for  each  pattern  p.The  resulting  machine  AT 
can  then  be  used  to  scan  as  many  input  strings  as  we  want.  But  steps  1  through  3 
are  expensive  since  the  number  of  states  of  M '  may  grow  exponentially  with  the 
number  of  states  of  M  (i.e.,  with  the  number  of  characters  in  p). 

So  can  we  beat  string-search-using- FSM  si  In  particular,  can  we  design  a  search 
algorithm  whose  matching  time  is  linear  in  n  (the  length  of  t)  but  that  can  be  effi¬ 
cient  in  performing  any  necessary  preprocessing  of  pi  The  answer  to  this  second 
question  is  also  yes.  One  way  to  do  it  is  to  use  the  buildkey  word  FSM  algorithm. 
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EXAMPLE  27.5  ( Continued ) 

which  we  presented  in  Section  6.2.4,  to  build  a  deterministic  FSM  directly  from  the 
pattern.  An  alternative  is  to  search  directly  without  first  constructing  an  FSM. 

The  Knuth-Morris-Pratl  algorithm  Q  does  the  latter.  It  is  a  variant  of  siniple- 
string-search  that  is  efficient  both  in  preprocessing  and  in  searching. To  see  how  it 
works,  we’ll  begin  with  an  example.  Let  /  and  p  be  as  shown  here.  Simple-string- 
search  begins  by  trying  to  match  p  starling  in  position  0: 

012345678 
r.  abcababcabd 

p.  a  b  c  a  b  d 

ft 

We’ve  marked  with  an  *  the  point  at  which  siniple-string-searcli  notices  that  its 
first  attempt  to  find  a  match  has  failed.  Simple-string-search  will  increment  i  by  1, 
thus  shilling  the  pattern  one  character  to  the  right,  and  then  it  will  try  again,  this 
time  checking: 

012345670 

i :  abcababcabd 

p:  a  b  c  a  b  d 

ft 

But  it  shouldn't  have  had  to  bother  doing  that.  It  already  knows  what  the  first 
five  characters  of  /  arc. The  first  one  doesn't  matter  since  the  pattern  is  going  to  be 
shifted  past  it  to  the  right.  But  the  next  four  characters,  bcab.  tell  it  something. 
They  are  not  the  beginning  of  the  pattern  it  is  trying  to  match.  It  makes  no  sense 
to  try  again  to  match  starting  with  the  b  or  with  the  c. 

Assume  that  a  match  fails.  When  it  does,  the  current  value  of  j  is  exactly  the 
number  of  characters  that  were  successfully  matched  before  the  failure  was  de¬ 
tected.  We  ignore  the  first  of  those  characters  since  we  will  slide  the  pattern  at 
least  one  character  to  the  right  and  so  the  first  matched  character  will  never  be 
considered  again.  Call  the  remaining  j  -  1  characters  the  kernel.  In  our  example, 
when  the  first  mismatch  was  detected,  /  was  5,  so  the  kernel  is  bcab.  Now  notice 
that,  given  a  value  for  j.  we  can  compute  the  only  possible  kernel  just  from  the 
pattern  p.  It  is  independent  of  t.  Specifically,  the  kernel  that  corresponds  to  j  is 
composed  of  characters  1  through  j  -  I  of  p  (numbering  from  0  again). 

Given  a  kernel  from  the  last  match,  how  do  we  know  how  far  to  the  right  we 
can  slide  the  pattern  before  we  have  to  try  again  to  match  it  against  /?  The  an¬ 
swer  is  that  we  can  slide  the  beginning  of  the  pattern  to  the  right  until  it  is  just 
past  the  kernel.  But  then  we  have  to  slide  it  back  to  the  left  to  account  for  any 
overlap  between  the  end  of  the  kernel  and  the  beginning  of  the  pattern.  So  how 
far  is  that?  To  answer  that  question,  we  do  the  following.  Start  by  placing  the  ker¬ 
nel  on  one  line  and  the  pattern,  immediately  to  the  right  of  it,  on  the  line  below 
it.  So  we  have,  in  our  example: 


bcab 


a  b  c  a  b  d 
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Now  slide  the  pattern  as  far  to  the  left  as  it  can  go  subject  to  the  constraint  that, 
when  we  stop,  any  characters  that  are  lined  up  in  a  single  column  must  be  identical. 
So.  in  this  example,  we  can  slide  the  pattern  leftward  by  two  characters,  producing: 

bcab 

abcabd 

Thus,  given  this  particular  pattern  p,  if  j  is  Five  when  a  mismatch  is  detected, 
then  the  next  match  we  should  try  is  the  one  that  we  get  if  we  shift  the  pattern  five 
characters  to  the  right  minus  the  two  overlap  characters.  So  we  slide  it  three  char¬ 
acter  to  the  right  and  we  try: 

012345678 
l;  abcababcabd 

p\  abcabd 

X 

Again  remember  that  this  analysis  of  sliding  distance  is  independent  of  the  text 
string  t.  So  we  can  preprocess  a  pattern  p  to  determine  what  the  overlap  numbers 
are  for  each  value  of  j.  We  will  store  those  numbers  in  a  table  we  will  call  T.  Note 
that  if  j  =  0  or  1,  the  corresponding  kernel  will  be  empty.  For  reasons  that  will  be¬ 
come  clear  when  we  see  exactly  how  the  table  T  is  going  to  be  used,  set  7'[0]  to  —1 
and  T[1  ]  to  0.  For  the  pattern  abcabd  that  we  have  been  considering,  T  will  be: 


i 

0 

1 

2 

um 

4 

5 

T\j\ 

-1 

0 

0 

0 

1 

2 

the  kernel 

e 

e 

b 

be 

bca 

bcab 

Now.  continuing  with  our  example,  notice  something  else  about  what  should 
happen  on  the  next  match  attempt.There  were  two  characters  of  overlap  between 
the  pattern  and  the  kernel.  That  means  that  we  already  know  that  the  first  two 
pattern  characters  match  against  the  last  two  kernel  characters  and  that  those  last 
two  kernel  characters  are  identical  to  the  two  text  characters  we  would  look  at 
first.  We  don't  need  to  check  them  again.  So,  each  time  we  reposition  the  pattern 
on  the  text  string  (thus  changing  the  index  i  in  the  search  algorithm  we  presented 
above),  we  can  also  compute  j,  the  first  character  pair  we  need  to  check.  Rather 
than  resetting  it  to  0  every  time,  we  can  jump  it  past  the  known  characters  and 
start  it  at  the  first  character  we  actually  need  to  check.  So  how  far  can  we  jump? 
The  answer  is  that  the  new  value  of  j  can  be  computed  by  using  its  previous  value 
as  an  index  into  T.The  new  value  of  j  is  exactly  T\j),  since  the  size  of  the  overlap 
is  exactly  the  length  of  the  substring  we  have  already  examined  and  thus  can  skip. 

We  can  now  state  our  new  search  algorithm  based  on  these  two  optimizations 
(i.e.,  sliding  the  pattern  to  the  right  as  far  as  possible  and  starting  to  check  the 
next  match  as  far  to  the  right  as  possible): 

Knuth-Morris-Pran(t,  p:  strings)  = 
i  =  0. 

/  =  0. 
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EXAMPLE  27.5  ( Continued ) 


/*  Continue  the  match 
/*  Match  failed.  Need  to 
slide  the  pattern  to  the 
right. 

I*  The  entire  pattern 
matched. 

I*  Slide  the  pattern  as  far 
as  possible  to  the  right. 
/*  Start  j  at  the  first  char¬ 
acter  we  actually  need 
to  check. 

I*  Checked  all  the  way  to 
the  end  and  didn't  find 
a  match. 

Knuth-Morris-Pratt  is  identical  to  simple-string-search  except  in  the  two  lines 
marked  on  the  left  with  asterisks.  The  only  difference  is  in  how  /  and  j  are  updat¬ 
ed  each  time  a  new  match  starts. 

Looking  at  the  algorithm,  it  should  be  clear  why  we  assigned  T[G\  the  value  -1. 
If  a  match  fails  immediately,  we  have  to  guarantee  that  the  pattern  gets  shifted  one 
character  to  the  right  for  the  next  match.  Assigning  T[ 0]  the  value  -1  does  that. 
Unfortunately  though,  that  assignment  does  mean  that  we  must  treat  j  =  0  as  a 
special  case  in  computing  the  next  value  for  j.  That  value  must  be  0,  not  —1.  Thus 
the  use  of  the  max  function  in  the  expression  that  defines  the  next  value  for  ;. 

Assuming  that  T  can  be  computed  and  that  it  has  the  values  shown  above,  we 
can  now  illustrate  the  operation  of  Knuth-Morris-Pratt  on  our  example.  At  each 
iteration,  we  show  the  value  of  j  (i.e.,  the  position  at  which  we  start  comparing  the 
pattern  to  the  text),  with  an  underline: 


While  i  s  |/|  -  |p|  do: 

While  j  <  |p|  do: 

If  f[i  +  ;]  =  pU]  then  /  =  j  +  1. 

Else  exit  this  loop. 

If  j  =  \p\  then  halt  and  accept. 
Else: 

i  =  i+;-  7*1/1. 
j  =  T[j]). 

Halt  and  reject. 


t. 

p- 


r. 

P- 


/: 

P- 


01234S678 

abcababcabd 

abcabd 

* 


abcababcabd 

abcabd 

K 


abcababcabd 

abcabd 

« 


Start  with  i  =  O.j  *  0. 

Mismatch  found:  /  =  0,  /  *  5. 

Compute  new  values  for  next  match:  f  =  i  +  /  -  7 1  /)  =  0  +  5  -  2  =  3. 

j  =  max  (0.  T| /))  =  2. 


Mismatch  found  immediately:  /  =  3./  =  2. 

Compute  new  values  for  next  match:  i  =  /  +  /  -  7|/]  *3  +  2-  0*=5 

j  =  max  (0.  T\j\)  =  0. 


Complete  match  will  now  he  found. 
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How  much  we  can  slide  the  pattern  each  time  we  try  a  match  depends  on  the 
structure  of  the  pattern.  The  worst  case  is  a  pattern  like  aaaaaab.  Notice  that 
every  kernel  for  this  pattern  will  be  a  string  of  zero  or  more  a’s.  That  means  that 
the  pattern  overlaps  all  the  way  to  the  left  on  every  kernel.  This  is  going  to  mean 
that  it  is  never  possible  to  slide  the  pattern  more  than  one  character  to  the  right 
on  each  new  match  attempt.  Using  the  technique  we  described  above,  we  can 
build  T  (which  describes  the  number  of  characters  of  overlap)  for  this  pattern: 


j 

0 

1 

2 

3 

4 

5 

6 

-1 

0 

1 

2 

3 

4 

5 

|  the  kernel 

e 

e 

a 

aa 

aaa 

aaaa 

aaaaa 

Now  consider  what  happens  when  we  run  Knuth-Morris-Pratt  on  the  following 
example  using  this  new  pattern: 


r. 


p • 


f: 

P- 


t: 


P : 


012345678910  ... 

aaaaaaaaaaaaaaaaab  Start  with  i  =  0,;'  =  0. 
aaaaaab 

*  Mismatch  found:  i  =  0,;'  =  6. 


Compute  new  values  for  next  match:  i  =  j  +  j  —  7[y]  =  0  +  6  —  5 

/  *  max  (0,  T[j])  =  5. 


1. 


aaaaaaaaaaaaaaaaab 
aaaaaab 

x  Mismatch  found  almost  immediately:  i  =  1,/  =  6. 

Compute  new  values  for  next  match:  /  =  i+  /-7*(y]  =  l+  6-  5  =  2. 

j  =  max  (0,  r[;l)  =  5. 

aaaaaaaaaaaaaaaaab 

aaaaaab 


Mismatch  found  almost  immediately:  i  =  2 ./'  =  6. 


This  process  continues,  shifting  the  pattern  one  character  to  the  right  each 
time,  until  it  finds  a  match  at  the  very  end  of  the  string.  But  notice  that,  even 
though  we  weren’t  able  to  advance  the  pattern  more  than  one  character  at  each 
iteration,  we  were  able  to  start  j  out  at  5  each  time.  So  we  did  skip  most  of  the 
comparisons  that  simple-string-search  would  have  done. 

Analyzing  the  complexity  of  Knuth-Morris-Pratt  is  straightforward.  Ignore  for 
the  moment  the  complexity  of  computing  the  table  T.  We  will  discuss  that  below. 
Assuming  that  Thas  been  computed,  we  can  count  the  maximum  number  of  com¬ 
parisons  that  will  be  done  given  a  text  t  of  length  n  and  a  pattern  p  of  length  m. 
Consider  each  character  c  of  t.  If  the  first  comparison  of  p  to  c  succeeds,  then  one 
of  the  following  things  must  happen  next: 

•  The  rest  of  the  pattern  also  matches.  No  further  match  attempts  will  be  made 
so  c  will  never  be  examined  again. 

•  Somewhere  later  the  pattern  fails.  But,  in  that  case,  c  becomes  part  of  the  ker¬ 
nel  that  will  be  produced  by  that  failed  match.  No  kernel  characters  are  ever 
reexamined.  So  c  will  never  be  examined  again. 
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EXAMPLE  27.5  ( Continued ) 

So  the  number  of  successful  comparisons  is  no  more  than  /j.The  number  of  un¬ 
successful  comparisons  is  also  no  more  than  n  since  every  unsuccessful  comparison 
forces  the  process  to  stop  and  start  over,  sliding  the  pattern  at  least  one  character 
to  the  right.That  can  happen  no  more  than  n  times.  So  the  total  number  of  compar¬ 
isons  is  no  more  than  2n  and  so  is  O(n). 

It  remains  to  describe  the  algorithm  that  constructs  the  table  T. The  obvious  ap¬ 
proach  is  to  try  matching  p  against  each  possible  kernel,  starting  in  each  possible 
position.  But  we  would  like  a  technique  that  is  C?(/n),  i-e.,  linear  in  the  length  of  the 
pattern.  Such  an  algorithm  exists.  It  builds  up  the  entries  in  T one  at  a  time  starting 
with  T[ 2]  (since  T[0]  is  always  -1  and  7*  [  1  ]  is  always  ()).The  idea  is  the  following: 
Assume  that  we  have  already  considered  a  kernel  of  length  k  -  1  and  we  are  now 
considering  one  of  length  fc.This  new  kernel  is  identical  to  the  previous  one  except 
that  one  more  character  from  p  has  been  added  to  the  right.  So,  returning  to  our 
first  example,  assume  we  have  already  processed  the  kernel  of  length  3  and  ob¬ 
served  a  one  character  overlap  (shown  in  the  hox)  with  the  pattern: 

kernel:  b  c|a| 

paliern:  [ajb  c  a  b  d 

To  form  the  next  longer  kernel  we  add  a  b  to  the  right  of  the  previous  kernel: 

kernel:  be  [alb 

pattern:  |a|b  c  a  b  d 

Notice  that  there  is  no  chance  that  there  is  now  an  overlap  that  starts  to  the  left 
of  the  one  we  found  at  the  last  step.  If  the  pattern  didn't  match  those  earlier  char¬ 
acters  of  the  kernel  before,  it  still  won't.  There  are  only  three  possibilities: 

•  The  match  we  found  at  the  previous  step  can  be  extended  by  one  character. 
That  is  what  happens  in  this  case.  When  this  happens,  the  value  of  T  for  the 
current  kernel  is  one  more  than  it  was  for  the  last  one. 

•  The  match  we  found  on  the  previous  step  cannot  be  extended.  In  that  case,  we 
check  to  see  whether  a  new,  shorter  match  can  be  started. 

•  Neither  can  the  old  match  be  extended  nor  a  new  one  started.  In  this  case,  the 
value  of  T  corresponding  to  the  current  kernel  is  0. 

Based  on  this  observation,  we  can  define  the  following  algorithm  for  comput¬ 
ing  the  table  T: 

buildoverliip(p:  pattern  siring) 

T[0]  =  -1. 

T[l]  =  0. 

7  =  2 


I*  j  is  the  index  of  the  element  of  T  we  are 
currently  computing.  It  is  the  entry  for  a 
kernel  of  length  /  -  1. 
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k  =  0.  I*  k  is  the  length  of  the  overlap  from  the 

/*  previous  element  of  T. 

While  j  <  | p|  do:  I*  When  j  equals  |p|,  all  elements  of  T 

I*  have  been  filled  in. 

Compare  p\j  —  1]  to  p[k].  I*  Compare  the  character  that  just  got 

/*  appended  to  the  kernel  to  the  next 
/*  character  of  p  to  see  if  the  current 
match  can  be  extended. 

If  they  are  equal  then:  /*  Extend  the  previous  overlap  by  one 

/*  character. 

T[j\  =  k  +  1. 

;=/■  +  1.  I*  We  know  the  answer  for  this  cell  and 

/*  can  go  on  to  the  next. 

k  =  k  +  1 .  I*  The  overlap  length  just  increased  by  one. 

If  they  are  not  equal  but  I*  See  if  a  shorter  match  is  possible, 

k  >  0  then:  /*  starting  somewhere  in  the  box  that 

/*  enclosed  the  match  we  had  before. 

k  =  7[/c].  /*  Don't  increment  j  since  we  haven’t 

I*  finished  this  entry  yet. 

If  they  are  not  equal  and  k  =  0  then:  I*  No  overlap  exists. 

m  -  o. 

j  —  j  +  1.  /*  We  know  the  answer  for  this  cell  and 

/*  can  go  on  to  the  next. 

k  =  0.  /*  The  overlap  length  is  back  to  0. 

Builthverlap  executes  at  most  2 m  comparisons  (where  m  is  the  length  of  the 
pattern  p).  So  the  total  number  of  comparisons  executed  by  Kmith-Morris-Pratt 
on  a  text  of  length  n  and  a  pattern  of  length  m  is  0(n  +  m).  Particularly  if  either 
n  or  m  is  very  large,  this  is  a  substantial  improvement  over  simple-string-search , 
which  required  O(nm)  comparisons. 


.2  Replacing  an  Exponential  Algorithm  with  a  Polynomial  One 

Sometimes  we  can  get  substantially  greater  speedup  than  we  did  in  the  last  two  exam¬ 
ples.  We  may  be  able  to  replace  one  algorithm  with  another  whose  asymptotic  complex¬ 
ity  is  much  belter.  We’ve  already  seen  two  important  examples  of  this: 

•  Given  a  string  w,  and  a  context-free  language  L,  described  by  a  grammar  G,  an  obvi¬ 
ous  way  to  decide  whether  w  is  an  element  L  is  to  try  all  the  possible  ways  in  which  w 
might  be  derived  using  the  rules  of  G.  Alternatively,  we  could  try  all  paths  through  the 
nondeterminislic  PDA  that  can  be  constructed  from  G.  But  both  of  these  approaches 
arc  0(2  ).  Practical  parsers  must  be  substantially  more  efficient  than  that.  In  Chapter 
15  wc  saw  that,  for  many  useful  context-free  languages,  we  can  build  linear-time 
parsers.  Wc  also  saw  that  it  is  possible  to  retain  generality  and  to  parse  an  arbitrary 
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context-free  language  in  0(w3)  time  using  techniques,  such  as  the  Cocke-Kasarai- 
Younger  algorithm  and  the  Earley  algorithm,  that  exploit  dynamic  programming. 

•  Given  a  hidden  Markov  model  (HMM)  M  and  an  observed  output  sequence  0,  an 
obvious  way  to  determine  the  path  through  M  that  was  most  likely  to  have  produced 
O  is  to  try  all  paths  through  M  of  length  |0|,  compute  their  probabilities,  and  then 
choose  the  one  with  the  highest  such  probability.  But.  letting  n  be  |0|.  this  approach 
is  0(2").  If  HMMs  are  to  be  useful,  particularly  in  real-time  applications  like  speech 
understanding,  they  have  to  be  substantially  faster  than  that.  But.  again,  we  can  ex¬ 
ploit  dynamic  programming.  The  Vilerbi  and  the  forward  algorithms,  which  we  de¬ 
scribed  in  Section  5.1 1.2,  run  in  0(k2n)  time,  where  k  is  the  number  of  states  in  Af. 

Whenever  our  first  attempt  to  solve  a  problem  yields  an  exponential-time  algo¬ 
rithm.  it  will  be  natural  to  try  to  do  better.  The  next  example  is  a  classic  case  in  which 
that  effort  succeeds. 

EXAMPLE  27.6  Greatest  Common  Divisor  and  Euclid's  Algorithm 

One  of  the  earliest  problems  for  which  an  efficient  algorithm  replaced  a  very  inef¬ 
ficient.  but  obvious  one,  is  greatest  common  divisor  (or  gcd).  Let  n  and  in  be  inte¬ 
gers.  Then  gcd(n.m)  is  the  largest  integer  k  such  that  k  is  a  factor  of  both  n  and  m. 
The  obvious  way  to  compute  gcd  is: 

gcd-obvious(n ,  nr.  integers)  = 

1.  Compute  the  prime  factors  of  both  n  and  m. 

2.  Let  k  be  the  product  of  all  factors  common  to  n  and  m  (including  duplicates). 

3.  Return  k. 

So.  for  example,  the  prime  factors  of  40  are  {2, 2, 2. 5} .  The  prime  factors  of  60 
are  {2, 2, 3. 5}.  So gcd(AQ, 60)  =  2-2-5  =  20. 

Unfortunately,  no  efficient  (i.e.,  polynomial-time)  algorithm  for  prime  factor¬ 
ization  is  known.  So  the  obvious  solution  to  the  gcd  problem  is  also  inefficient. 

But  there  is  a  better  way.  The  following  technique  o  was  known  to  the  ancient 
Greeks.  Although  probably  discovered  before  Euclid,  one  version  of  it  appeared 
in  Euclid's  Elements  in  about  3(X)  B.C.  and  so  the  technique  is  commonly  called 
Euclid's  algorithm: 

gcd-Euclid(n.  m :  integers)  = 

If  m  =  0  return  n. 

Else  return  gcd-Euclid(m,n  {mod  m)).  where  n  (mod  in)  is  the  remainder 
after  integer  division  of  n  by  m. 

To  see  that  gcd-Euclid  must  eventually  halt,  observe  that  n(mod  m)  <  m.  So 
the  second  argument  to  gcd-Euclid  is  strictly  decreasing.  Since  it  can  never  become 
negative.it  must  eventually  become  O.The  proof  that  gcd-Euclid  halts  with  the  cor¬ 
rect  result  rests  on  the  observation  that,  for  any  integers  ;i  and  m,  if  some  integer 
k  divides  both  n  and  m  it  must  also  divide  n  (mod  wi).To  see  why  this  is  so,  notice 
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that  there  exists  some  natural  number  j  such  that  n  =  jm  +  ( n(mod  m)).  So,  if 
both  n  and  jm  are  divisible  by  k,  n  ( mod  m)  must  also  be. 

Next  we  analyze  the  time  complexity  of  gcd- Euclid.  Again,  the  key  is  that  its 
second  argument  is  strictly  decreasing.  The  issue  is,  “How  fast?”  The  answer  is 
based  on  the  observation  that  n(mod  m)  s  nil.  To  see  why  this  is  so,  consider 
two  cases: 

•  m  s  nil:  We  have  n(mod  m)  <  m  ^  n/2  and  thus  n{mod  m)  ^  nil. 

•  m  >  «/2:Then  n{mod  m)  =  n  -  m.  So  n(mod  m)  s  nil. 

We  note  that  gcd-Euclid  swaps  its  arguments  on  each  recursive  call.  So,  after 
each  pair  of  calls,  the  second  argument  is  cut  at  least  in  half.  Thus,  after  at  most 
2  •  log2  m  calls,  the  second  argument  will  be  equal  to  0  and  gcd-Euclid  will  halt.  If  we 
assume  that  each  division  has  constant  cost,  then  timereq(gcd- Euclid)  e  C?(log2 
(max(n,  m))). 

We  can  turn  the  gcd  problem  into  a  language  to  be  recognized  by  defining: 

•  RELATIVELY-PRIME  =  {  <«,  m>  :  n  and  m  are  integers  and  they  are  rela¬ 
tively  prime}.  Two  integers  are  relatively  prime  iff  their  gcd  is  1. 

The  following  procedure  decides  RELATIVELY-PRIME: 

REL-PRIMEdecide(<n,  m:  integers>)  = 

If  gcd-Euclid(n ,  m)  =  1  then  accept;  else  reject. 

We  already  know  that  timereq(gcd- Euclid)  eO(log2(majc(n,  m))).  But  recall 
that  the  length  of  the  string  encoding  of  an  integer  k  is  0(log  k).  So,  if  the  input 
to  REL-PRIMEdecide  has  length  |<n,  m>|,  then  max(n,m)  may  be 
Thus  timereq(REL-PRlMEdecide)  eO(\og2(^°,'m>i)  =  0(|<n,  m>\).  So  REL- 
PRIMEdecide  runs  in  linear  time. 


In  Section  28.1,  we  will  see  other  examples  of  problems  that  can  be  solved  in  an  obvi¬ 
ous  way  using  an  exponential-time  algorithm  but  for  which  more  efficient,  polynomial¬ 
time  algorithms  also  exist.  But  then,  in  Section  28.2,  we’ll  consider  a  large  family  of 
problems  for  which  no  efficient  solutions  are  known,  despite  the  fact  that  substantial  ef¬ 
fort  has  gone  into  searching  for  them. 

.8.3  Time-Space  Tradeoffs 

Space  efficiency  and  time  efficiency  affect  the  utility  of  an  algorithm  in  different  ways.  In 
the  early  days  of  computing,  when  memory  was  expensive,  programmers  worried  about 
small  factors  (and  even  constants)  in  the  amount  of  memory  required  by  their  programs. 
But,  in  modern  computers,  memory  is  cheap,  fast,  and  plentiful.  So  while  it  may  matter  to 
us  whether  one  program  takes  twice  as  long  as  another  one  to  run,  we  rarely  care 
whether  it  takes  twice  as  much  memory. That  is  we  don't  care  until  our  program  runs  out 
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of  memory  and  stops  dead  in  its  tracks. Time  inefficiency  may  lead  to  a  graceful  degrada¬ 
tion  in  system  performance.  Memory'  inefficiency  may  make  a  program's  performance 
"fall  off  a  cliff.  So  there  are  cases  where  we  have  no  choice  hut  to  choose  a  less  time- 
efficient  algorithm  in  place  of  a  more  lime-eflicient  one  because  the  former  uses  less 
space. This  is  particularly  likely  to  happen  when  wc  are  solving  intrinsically  hard  prob¬ 
lems,  in  other  words  those  where,  no  matter  what  we  do.  the  amount  of  time  and/or 
memory  grows  very  quickly  as  the  size  of  the  problem  increases. 


EXAMPLE  27.7  Search:  Depth-First,  Breadth-First, 
and  Iterative  Deepening 

Consider  the  problem  of  searching  a  tree.  We  have  discussed  this  problem  at  var¬ 
ious  points  throughout  this  book.  For  example.  Theorem  17.2  tells  us  that,  for  any 
nondeterministic  deciding  or  semidcciding  Turing  machine  M ,  there  exists  an 
equivalent  deterministic  one.  The  proof  given  in  E.l  is  by  construction  of  a  deter¬ 
ministic  machine  that  conducts  a  search  through  the  computational  paths  of  M.  If 
it  finds  an  accepting  path,  then  it  accepts. 

What  search  algorithm  shall  we  use  to  solve  problems  such  as  this? 

•  Depth-first  search  chooses  one  branch  and  follows  it  until  it  reaches  either  a  solu¬ 
tion  or  a  dead-end.  In  the  latter  case,  it  backs  up  to  the  most  recent  decision  point 
from  which  there  still  exists  an  unexplored  branch.  Then  it  picks  one  such  branch 
and  follows  it.  This  process  continues  until  either  a  solution  is  found  or  no  unex¬ 
plored  alternatives  remain.  Depth-first  search  is  easy  to  implement  and  it  requires 
very  little  space  (just  a  stack  whose  depth  equals  the  length  of  the  path  that  is  cur¬ 
rently  being  considered).  But  depth-first  search  can  gel  stuck  exploring  a  bad  path 
and  miss  exploring  a  better  one.  For  example,  in  the  proof  of  Theorem  172,  we 
must  consider  the  case  in  which  some  of  M' s  paths  do  not  halt.  A  depth-first  search 
could  get  stuck  in  one  of  them  and  never  gel  around  to  finding  some  other  path 
that  halts  and  accepts.  So  depth-first  search  cannot  be  used  to  solve  this  problem. 

•  Breadth-first  search  explores  all  paths  to  depth  one.  storing  each  of  the  nodes  it 
generates.  Next  it  expands  each  of  those  nodes  one  more  level,  generating  a  new 
fringe  of  leaf  nodes. Then  it  returns  to  those  leaf  nodes  and  expands  all  of  them 
one  more  level. This  process  continues  until  either  a  solution  is  found  or  no  leaf 
nodes  have  any  successors.  Breadth-first  search  cannot  get  stuck  since  it  explores 
all  paths  of  length  k  before  considering  any  paths  of  length  k  +  1.  But  breadth- 
first  search  must  store  every  partial  path  in  memory.  So  the  amount  of  space  it 
requires  grows  exponentially  with  the  depth  of  the  search  tree  that  it  is  explor¬ 
ing.  A  Turing  machine  has  an  infinite  tape,  so  it  will  never  run  out  of  room.  How¬ 
ever.  managing  it  and  shifting  its  contents  around  are  difficult.  Real  computers, 
though,  have  finite  memory.  So,  for  practical  problems,  breadth-first  search  can 
work  very  well  as  long  as  the  available  memory  is  adequate  for  storing  all  the 
partial  paths.  As  soon  as  it  is  not.  the  search  process  unceremoniously  stops. 
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•  Iterative  deepening  is  a  compromise  between  breadth-first  search  and  depth- 
first  search.  It  first  explores  all  paths  of  length  1  using  depth-first  search.  Then  it 
starts  over  and  explores  all  paths  of  length  2  using  depth-first  search.  And  then 
all  paths  of  length  3,  and  so  forth.  Whenever  it  finds  a  solution,  at  some  depth,  it 
halts.  The  space  complexity  of  iterative  deepening  is  the  same  as  for  depth-first 
search.  And  its  time  complexity  is  only  slightly  worse  than  that  of  breadth-first 
search.  This  may  seem  counterintuitive,  since,  for  each  k,  the  search  to  depth  k 
starts  over,  it  doesn’t  use  any  of  the  results  from  the  search  to  depth  k  -  1.  We 
present  the  algorithm  in  detail  in  E.l,  and  we  analyze  its  complexity  in  E.2.  In  a 
nutshell,  the  reason  that  starting  the  search  over  every  time  isn't  such  a  bad  idea 
is  that  the  top  part  of  the  search  tree  is  the  part  that  must  be  generated  many 
times.  But  the  top  part  is  very  small  compared  to  the  bottom  part.  Iterative 
deepening  is  the  technique  that  we  use  to  prove  Theorem  17.2. 


Exercises 

1.  Let  M  be  an  arbitrary  Turing  machine. 

a.  Suppose  that  timereq(M)  =  3/i3 4(n  +  5)(n  -  4).  Circle  all  of  the  following 
statements  that  are  true: 

i.  timereq(M )  e  O(n). 

ii.  timereq(M)eO(nf'). 

iii.  limereq(M)  e  O(n5/50). 

iv.  timereq(M)eto(n('). 

b.  Suppose  that  timereq(M)  =  5"  •‘in3.  Circle  all  of  the  following  statements  that 
arc  true: 

i.  timereq(M)eO(n 5 6). 

ii.  timereq(M)  e  0(2"). 

iii.  timer eq( W)eO(n!). 

2.  Show  a  function  /,  from  the  natural  numbers  to  the  reals,  that  is  0(1)  but  that  is 
not  constant. 

3.  Assume  the  definitions  of  the  variables  given  in  the  statement  of  Theorem  27.1. 
Prove  that  if  s  >  1  then: 

0(ri2")Q0(2{n\ 

4.  Prove  that,  if  0  <  a  <  b,  then  nh  g  0(na). 

5.  Let  M  be  the  Turing  machine  shown  in  Example  17.9.  M  accepts  the  language 
WcW  =  {u>cu>:  we  {a.bj-*}.  Analyze //mere^Af). 

6.  Assume  a  computer  that  executes  10U)  operations/second.  Make  the  simplifying  as¬ 
sumption  that  each  operation  of  a  program  requires  exactly  one  machine  instruc¬ 
tion.  For  each  of  the  following  programs  P ,  defined  by  its  time  requirement,  what  is 
the  largest  size  input  on  which  P  would  be  guaranteed  to  halt  within  a  week? 
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a.  timereq(P)  —  5243 n  +  649. 

b.  timereq(P)  —  5 n2. 

c.  timereq(P)  =  5". 

7.  Let  each  line  of  the  following  table  correspond  to  a  problem  for  which  two  algo¬ 
rithms,  A  and  B,  exist.  The  table  entries  correspond  to  timereq  for  each  of  those 
algorithms.  Determine,  for  each  problem,  the  smallest  value  of  n  (the  length  of 
the  input)  such  that  algorithm  B  runs  faster  than  algorithm  A. 


A 

B 

ir 

572m  +  4171 

2 

#r 

1000m  log, m 

/>! 

450m2 

n! 

3"  +  2 

8.  Show  that  L  =  {<M>:  M  is  a  Turing  machine  and  timereq(M)  e 0(/i2)}  is  not 
in  SD. 

9.  Consider  the  problem  of  multiplying  two  n  X  n  matrices. The  straightforward  al¬ 
gorithm  multiply  computes  C  =  A-  B  by  computing  the  value  for  each  element 
of  C  using  the  formula: 

n 

CU  =  X  AaBk.j  ^  ‘J  =  1 . '»• 

k  =  1 

Multiply  uses  n  multiplications  and  n  —  1  additions  to  compute  each  of  the  n 2 
elements  of  C.So  it  uses  a  total  of  n'  multiplications  and  n3  -  n~  additions. Thus 
timer eq^multi  ply)  e  0(h3). 

We  observe  that  any  algorithm  that  performs  at  least  one  operation  for  each 
element  of  C  must  take  at  least  ir  steps.  So  we  have  an  tr  lower  bound  and  an  n3 
upper  bound  on  the  complexity  of  matrix  multiplication.  Because  matrix  multi¬ 
plication  plays  an  important  role  in  many  kinds  of  applications  (including,  as  we 
saw  in  Section  15.3.2,  some  approaches  to  context-free  parsing),  the  question  nat¬ 
urally  arose,  “Can  we  narrow  that  gap?”  In  particular,  does  there  exist  a  better 
than  0(n3)  matrix  multiplication  algorithm?  In  [Strassen  1969],  Volker  Strassen 
showed  that  the  answer  to  that  question  is  yes. 

Strassen’s  algorithm  exploits  a  divide-and-conquer  strategy  in  which  it  com¬ 
putes  products  and  sums  of  smaller  submatrices.  Assume  that  n  -  2k,  for  some 
k  a  1.  (If  it  is  not,  then  we  can  make  it  so  by  expanding  the  original  matrix  with 
rows  and  columns  of  zeros,  or  we  can  modify  the  algorithm  presented  here  and 
divide  the  original  matrix  up  differently.)  We  begin  by  dividing  A ,  B,  and  C  into 
2x2  blocks.  So  we  have: 


A\.\ 

A\2 

,B  = 

X. 

B\2 

,  and  C  = 

X. 

Cu 

_ A u 

A  2  2_ 

A  i 

B22_ 

cu 

Cuj 

where  each  Au,  B;J,  and  C4>/  is  a  2*  '  1  x  2*  '  1  matrix. 
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With  this  decomposition,  we  can  state  the  following  equations  that  define  the 
values  for  each  element  of  C: 

C|.i  =  +  *15*2.1- 

C15  =  *1.1*15  +  *15*25- 
C2.1  =  *2.1  *1.1  +  *25*2.1- 
C22  =  *2.1*15  +  *25*25- 

So  far.  decomposition  hasn’t  bought  us  anything.  We  must  still  do  eight  multi¬ 
plications  and  four  additions,  each  of  which  must  be  done  on  matrices  of  size 
2* "  ‘.  Strassen’s  insight  was  to  define  the  following  seven  equations: 

Q\  =  (*  1.1  +  *25)(*1.1  +  *2.2)- 

02  =  (*2,1  +  *25)*u- 

03  =  *l.l(*15  —  *25)- 

04  =  *25(*2,1  “  *l.l)- 

05  =  (*1.1  +  *15)*25- 

06  =  (* 2.1  “  *l.l)(*l.l  +  *1.2)- 

07  =  (*15  _  *2.2)(*2.1  +  *2 5)- 

These  equations  can  then  be  used  to  define  the  values  for  each  element  of  C  as  follows: 

Cu  =  01  +  04  -  05  +  07- 
=  03  +  05- 
^2,1  =  02  +  04- 
C22  =  01  ~  02  +  03  +  06- 

Now,  instead  of  eight  matrix  multiplications  and  four  matrix  additions,  we  do 
only  seven  matrix  multiplications,  but  we  must  also  do  eighteen  matrix  additions 
(where  a  subtraction  counts  as  an  addition).  We’ve  replaced  twelve  matrix  opera¬ 
tions  with  25.  But  matrix  addition  can  be  done  in  0(/r)  time,  while  matrix  multi¬ 
plication  remains  more  expensive. 

Slrassen's  algorithm  applies  these  formulas  recursively,  each  time  dividing 
each  matrix  of  size  2*  into  four  matrices  of  size  2k_1.  The  process  halts  when 
k  =  1.  (Efficient  implementations  H  of  the  algorithm  actually  stop  the  recursion 
sooner  and  use  the  simpler  multiply  procedure  on  small  submatrices.  We’ll  see 
why  in  part  (e)  of  this  problem.)  We  can  summarize  the  algorithm  as  follows: 

Strassen(A,  B,  k :  where  A  and  B  are  matrices  of  size  2*)  = 

If*  =  1  then  compute  the  Q's  using  scalar  arithmetic.  Else,compute  them  as  follows: 

0,  =  Strassen((Au  +  *25).  (*u  +  *25).*  -  1). 

02  =  S//vmen((,42.i  +  *25)-  *1.1-*  ~  1). 

•  •  •  I*  Compute  all  the  0  matrices  as  described  above. 

07  = 

c,.,= 


C25  ~ 
Return  C. 


/*  Compute  all  the  C  matrices  as  described  above. 
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In  the  years  following  Strassen’s  publication  of  his  algorithm,  newer  ones  that  use 
even  fewer  operations  have  been  discovered  Q.The  fastest  known  technique  is  the 
Coppersmith-Winograd  algorithm,  whose  time  complexity  is  0{trm).  But  it  is  too 
complex  to  be  practically  useful.  There  do  exist  algorithms  with  better  performance 
than  Strassen,  but,  since  it  opened  up  this  entire  line  of  inquiry,  we  should  understand 
its  complexity.  In  this  problem,  we  will  analyze  timereq  of  Strassen  and  compare  it  to 
timereq  of  the  standard  algorithm  multiply.  Wc  should  issue  two  caveats  before  we 
start,  however:  The  analysis  that  wc  are  about  to  do  just  counts  scalar  multiplies  and 
adds.  It  does  not  worry  about  such  things  as  the  behavior  of  caches  and  the  use  of 
pipelining.  In  practice,  it  turns  out  that  the  crossover  point  for  Strassen  relative  to 
multiply  o  is  lower  than  our  results  suggest.  In  addition.  Strassen  may  not  be  as  nu¬ 
merically  stable  as  multiply  is,  so  it  may  not  be  suitable  for  all  applications. 

a.  We  begin  by  defining  nuth'(k)  to  be  the  number  of  scalar  multiplications  that 
will  be  performed  by  Strassen  when  it  multiplies  two  2k  x  2*  matrices.  Similarly, 
let  add'(k)  be  the  number  of  scalar  additions.  Describe  both  mult'{k)  and 
add'(k)  inductively  by  stating  their  value  for  the  base  case  (when  k  =  1)  and 
then  describing  their  value  for  k  >  1  as  a  function  of  their  value  for  k  —  1. 

b.  To  find  closed  form  expressions  for  mull  '(A)  and  add'  requires  solving  the  recur¬ 
rence  relations  that  were  given  as  answers  in  part  (a).  Solving  the  one  for 
mult'(k)  is  easy.  Solving  the  one  for  add'(k)  is  harder.  Prove  that  the  following 
are  correct: 

mitir(k)  =  7*. 

add'(k)  =  f>*(7*  -  4A). 

c.  We’d  like  to  define  the  time  requirement  of  Strassen.  when  multiplying  two 

n  x  n  matrices,  as  a  function  of  n,  rather  than  as  a  function  of  log2  n,  as  we 
have  been  doing.  So  define  nwli(n)  to  be  the  number  of  multiplications  that 
will  be  performed  by  Strassen  when  it  multiplies  two  n  x  n  matrices.  Similarly, 
let  add{n)  be  the  number  of  additions.  Using  the  fact  that  k-  state 

nutlt{n )  and  add(n)  as  functions  of  n. 

d.  Determine  values  of  a  and  /3.  each  less  than  3.  such  that  mult(k)e®(na)  and 
add(k)e  0(/i^). 

e.  Let  ops(n)  =  mult(n)  +  addin)  be  the  total  number  of  scalar  multiplications 
and  additions  that  Strassen  performs  to  multiply  two  n  x  n  matrices.  Recall 
that,  for  the  standard  algorithm  multiply .  this  total  operation  count  is  2/^  —  n~. 
We’d  like  to  find  the  crossover  point,  i.e..  the  point  at  which  Strassen  performs 
fewer  scalar  operations  than  multiply  does.  So  find  the  smallest  value  of  k  such 
that  n  =  2k  and  ops(n)  <  2 n*  -  n~.  (Hint:  Once  you  have  an  equation  that 
describes  the  relationship  between  the  operation  counts  of  the  two  algorithms, 
just  start  trying  candidates  for  ^.starting  at  l.) 

10.  In  this  problem,  wc  will  explore  the  operation  of  the  Knulh-Morris-Pratt  string 
search  algorithm  that  we  described  in  Example  27.5.  Let  p  Ik*  the  pattern  cbacbcc. 

a.  Trace  the  execution  of  buildoverlap  and  show  the  table  T  that  it  builds. 

b.  Using  T ,  trace  the  execution  of  fo/M/i-Af«mv-/V<jH(cbaccbacbcc,  cbacbcc). 


CHAPTER  28 


Time  Complexity  Classes 


Some  problems  are  easy.  For  example,  every  regular  language  can  be  decided  in 
linear  time  (by  running  the  corresponding  DFSM).  Some  problems  are  harder. 
For  example,  the  best  known  algorithm  for  deciding  the  Traveling  Salesman  lan¬ 
guage  TSP-DECIDE  takes,  in  the  worst  case,  time  that  grows  exponentially  in  the  size 
of  the  input.  In  this  chapter,  we  will  define  a  hierarchy  of  language  classes  based  on  the 
time  required  by  the  best  known  decision  algorithm. 


28.1  The  Language  Class  P 

The  first  important  complexity  class  that  we  will  consider  is  the  class  P,  which  includes 
all  and  only  those  languages  that  are  decidable  by  a  deterministic  "Hiring  machine  in 
polynomial  time.  So  we  have: 

The  Class  P:  Le  P  iff  there  exists  some  deterministic  Turing  machine  M  that 

decides  L  and  timereq{M)  e  0(«*)  for  some  constant  k. 

It  is  common  to  think  of  the  class  P  as  containing  exactly  the  tractable  problems.  In 
other  words,  it  contains  those  problems  that  are  not  only  solvable  in  principle  (i.e.,they 
are  decidable)  but  also  solvable  in  an  amount  of  time  that  makes  it  reasonable  to 
depend  on  solving  them  in  real  application  contexts. 

Of  course,  suppose  that  the  best  algorithm  we  have  for  deciding  some  language  L  is 
0(/i,oo°)  (i.e.,  its  running  time  grows  at  the  same  rate,  to  within  a  constant  factor,  as 
nm)).  It  is  hard  to  imagine  using  that  algorithm  on  anything  except  a  toy  problem.  But 
the  empirical  fact  is  that  we  don’t  tend  to  find  algorithms  of  this  sort.  Most  problems  of 
practical  interest  that  are  known  to  be  in  P  can  be  solved  by  programs  that  are  no 
worse  than  0(ir )  if  we  are  analyzing  running  times  on  conventional  (random  access) 
computers.  And  so  they  re  no  worse  than  C5(n'8)  when  run  on  a  one-tape,  deterministic 
Turing  machine.  Furthermore,  it  often  happens  that,  once  some  polynomial  time  algo¬ 
rithm  is  known,  a  faster  one  will  be  discovered.  For  example,  consider  the  problem  of 
matrix  multiplication.  If  we  count  steps  on  a  random  access  computer,  the  obvious 
algorithm  for  matrix  multiplication  (based  on  Gaussian  elimination)  is  C?(n3).  Strassen’s 
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algorithm  u  is  more  efficient;  it  is  0(/?2*' ).  Other  algorithms  whose  asymptotic  complex¬ 
ity  is  even  lower  (approaching  0{n2))  arc  now  known,  although  they  arc  substantially 
more  complex. 

So,  as  we  consider  languages  that  are  in  P.  we  will  generally  discover  algorithms 
whose  time  requirement  is  some  low-order  polynomial  function  of  the  length  of  the 
input.  But  we  should  be  clear  that  not  all  languages  in  P  have  this  property.  In  Section 
28.9.1,  we'll  describe  the  time  hierarchy  theorems.  One  consequence  of  the  Determin¬ 
istic  Time  Hierarchy  Theorem  is  that,  for  any  integer  A  >1.  there  are  languages  that  can 
be  decided  by  a  deterministic  Turing  machine  in  0(nk)  time  but  not  in  0(/»*  "  *)  time.  It 
just  happens  that  if  A  =  5000,  we  are  unlikely  to  care. 

Going  the  other  direction,  if  we  have  a  problem  that  we  cannot  show  to  be  in  P,  is  it 
necessarily  intractable  in  practice?  Often  it  is.  But  there  may  be  algorithms  that  solve  it 
quickly  most  of  the  time  or  that  solve  it  quickly  and  return  the  right  answer  most  of  the 
lime.  For  example,  prior  to  the  recent  proof  (which  we  will  mention  in  Section  28.1.7) 
that  primality  testing  can  be  done  in  polynomial  time,  randomized  algorithms  that  per¬ 
formed  primality  testing  efficiently  were  known  and  commonly  used  Q.  We’ll  return  to 
this  approach  in  Chapter  30. 


28.1.1  Closure  of  P  under  Complement 

One  important  property  of  the  class  P  is  that,  if  a  language  L  is  in  P.so  is  its  complement: 

THEOREM  28.1  P  is  Closed  Under  Complement 

Theorem;  The  class  P  is  closed  under  complement. 

Proof:  For  any  language  L,  if  Le  P  then  there  exists  some  deterministic  Turing 
machine  M  that  decides  L  in  polynomial  time.  From  M.  we  can  build  a  new 
deterministic  Turing  machine  M'  that  decides  ->L  in  polynomial  time.  We  use  the 
same  construction  that  we  used  to  prove  Theorem  20.4  (which  tells  us  that  the 
decidable  languages  are  closed  under  complement).  M'  is  simply  M  with 
accepting  and  nonaccepting  states  swapped.  M'  will  always  halt  in  exactly  the 
same  number  of  steps  M  would  take  and  it  will  accept  ->L. 

For  many  problems  that  we  are  likely  to  care  about,  this  closure  theorem  doesn't 
give  us  exactly  the  result  we  need.  For  example,  we'll  show  below  that  the  language 
CONNECTED  =  { <G>  :  G  is  an  undirected  graph  and  O'  is  connected }  is  in  P.  We’d 
like  then  to  be  able  to  conclude  that  the  related  language  NOTCONNECTED  = 

{ <G>  :  G  is  an  undirected  graph  and  G  is  not  connected  }  is  also  in  P.  But  we  have  the 
same  problem  that  we  had  in  analyzing  languages  that  are  defined  in  terms  of  a  Turing 
machine's  behavior. 

•  -.CONNECTED  =  NOTCONNECTED  U  {strings  that  arc  not  syntactically  legal 
descriptions  of  undirected  graphs}. 

If,  however,  we  can  check  for  legal  syntax  in  polynomial  time,  then  we  can  consider 
the  universe  with  respect  to  which  the  complement  of  CONNECTED  is  computed  to 
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be  just  those  strings  whose  syntax  is  legal.  Then  we  can  conclude  that  NOTCON- 
NECTED  is  in  P  if  CONNECTED  is.  In  all  the  examples  we  will  consider  in  this  book, 
such  a  syntax  check  can  be  done  in  polynomial  time.  So  we  will  consider  the  comple¬ 
ment  of  some  language  L  to  be  the  language  consisting  of  strings  of  the  correct  syntax 
but  without  the  property  that  defines  L. 


28.1.2  Languages  That  Are  in  P 

We  have  already  discussed  many  examples  of  languages  that  are  in  P: 

•  Every  regular  language  is  in  P  since  every  regular  language  can  be  decided  in  linear 
time.  We’ll  prove  this  claim  as  Theorem  28.2  below. 

•  Every  context-free  language  is  in  P  since  there  exist  context-free  parsing  (and 
deciding)  algorithms  that  run  in  C?(/i3)  time  on  a  conventional  computer  (and  thus 
run  in  0(n,K)  time  on  a  single-tape  "Hiring  machine).  We'll  prove  this  claim  as 
Theorem  28.3  below. 

•  Some  languages  that  are  not  context-free  are  also  in  P.  One  example  of  such 
a  language  is  AnBnC"  =  {a',b"c'*:n  ^  0}.  In  Example  27.1,  we  analyzed  Af, 
the  Turing  machine  that  we  had  built  to  decide  AnBnC“.  We  showed  that 
iimereq(M)  =  2(n/3)(n/3  +  n/3  +  nf6)  +  n,  which,  as  we  showed  in  Example  27.3, 
is  in  0(n2). 


The  game  of  Nim  (appropriately  encoded  as  a  decision  problem  in  which  we 
ask  whether  there  is  a  guaranteed  win  for  the  current  player)  is  in  P.  But  it 
appears  that  very  few  “interesting”  games  are  in  P.  (N.2) 


Many  other  languages  are  also  in  P.  In  the  rest  of  this  section,  we  show  examples  of 
proofs  that  languages  are  in  P.  If  we  can  construct  a  one-tape,  deterministic  Turing 
machine  M  that  decides  some  language  L  in  polynomial  time,  then  we  have  a  proof  that 
L  is  in  P.  But  we  will  generally  find  it  substantially  easier  just  to  describe  a  decision  pro¬ 
cedure  as  it  would  be  implemented  on  a  conventional,  random  access  computer.  Then 
we  can  appeal  to  Theorem  17.4,  which  tells  us  that  a  deterministic  random  access  pro¬ 
gram  that  executes  t  steps  can  can  be  simulated  by  a  seven-tape  Turing  machine  in  ©(t3) 
steps.  We  also  showed,  in  Theorem  17.1,  that  /  steps  of  a  k- tape  Turing  machine  can  be 
simulated  in  C7(/2)  steps  of  a  standard  Hiring  machine.  Composing  these  results,  we  have 
that  if  a  random  access  program  runs  in  t  steps,  it  can  be  simulated  by  a  standardising 
machine  in  0( fft)  steps.  Since  the  composition  of  two  polynomials  is  a  polynomial,  if  we 
have  a  random  access  algorithm  that  runs  in  polynomial  time,  then  it  can  be  simulated  in 
a  polynomial  number  of  steps  on  a  deterministic  one-tape  TUring  machine  and  so  the 
language  that  it  decides  is  in  P. 

We  11  make  one  other  simplifying  assumption  as  well.  It  takes  0(n2)  steps  to  compare 
two  strings  of  length  n  on  a  one-tape  "Hiring  machine.  It  takes  0(log2  n)  to  compare  two 
integers  of  size  n  (since  their  string  descriptions  have  length  log  n).  We  can  do  similar 
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analyses  of  the  time  required  to  perform  arithmetic  operations  on  numbers  of  size  n. 
The  key  is  that  all  of  these  operations  can  be  performed  in  polynomial  time.  So,  if  our 
goal  is  to  show  that  an  algorithm  runs  in  polynomial  time,  we  can  assume  that  all  such 
operations  are  performed  in  constant  time  (which  many  of  them  are  on  real  computers). 
While  not  strictly  true,  this  assumption  will  have  no  effect  on  any  claim  we  may  make 
that  an  algorithm  runs  in  polynomial  time. 


28.1.3  Regular  and  Context-Free  Languages 

THEOREM  28.2  Every  Regular  Language  can  be  Decided  in  Linear  Time 

Theorem:  Every  regular  language  can  be  decided  in  linear  time.  So  every  regular 
language  is  in  P. 

Proof:  Given  any  regular  language  L,  there  exists  some  deterministic  finite 
stale  machine  M  =  (/(,  2.5,  .v.  A)  that  decides  L.  From  M,  we  can  construct 
a  deterministic  Turing  machine  M'  =  (K  U  {.v\  >’./»}.  2.  2  U  {J},S\.v\  {y}) 
that  decides  L  in  linear  lime.  Roughly.  M'  simply  simulates  the  steps  of  M, 
moving  its  rcad/writc  head  one  square  to  the  right  at  each  step  and  making  no 
change  to  the  tape.  When  M'  reads  a  LI.  it  halts.  If  it  is  in  an  accepting  state,  it 
accepts;  otherwise  it  rejects.  So.  if  ( q.a.p )  is  a  transition  in  M.  M'  will  contain  the 
transition  ((<7.  a).  (/?. a,  — *))•  Because  of  our  convention  that  the  read/write 
head  of  M'  will  be  just  to  the  left  of  the  first  input  character  when  it  begins,  M' 
will  need  a  new  start  state,  s',  in  which  it  will  read  a  J  and  move  right  to  the  first 
input  character.  Also,  since  FSMs  hall  when  they  run  out  of  input,  while  Hiring 
machines  halt  only  when  they  enter  an  explicit  halting  state.  M'  will  need  two 
new  states:  y.  which  will  hall  and  accept,  and  n.  which  will  halt  and  reject.  Finally, 
M'  will  need  transitions  into  y  and  n  labeled  J.  So.  if  q  is  a  state  in  M  (and  thus 
also  in  M ')  and  q  is  an  accepting  state  in  M.  M'  will  contain  the  transition 
((</,  □),  (y.Ll.  — * )).  If,  on  the  other  hand,*/  is  not  an  accepting  slate  in  M,M‘  will 
contain  the  transition  ((</,  J).  (n,  J,  —*  )). 

On  any  input  of  length  n,  M'  will  execute  ti  +  2  steps.  So  tinwreq(M')  e  0(n). 


It  is  significant  that  every  regular  language  can  be  decided  in  linear  time.  But  the 
fact  that  every  regular  language  is  in  P  is  also  a  consequence  of  the  more  general  fact 
that  we  prove  next:  Every  context-free  language  is  in  P. 

THEOREM  28.3  Every  Context-Free  Language  is  in  P 

Theorem:  Every  context-free  language  can  be  decided  in  £>(«**)  time.  So  every 
context-free  language  is  in  P. 

Proof:  In  Chapter  14.  we  showed  that  every  context-free  language  is  decidable. 
Unfortunately,  neither  of  the  algorithms  that  we  presented  there  (dccideCFLusing 
Grammar  and  decidcCFLusingPDA )  runs  in  polynomial  time.  But,  in  Section  153 
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we  presented  the  Cocke-Kasami- Younger  (CKY)  algorithm,  which  can  parse 
any  context-free  language  in  time  that  is  0(/t3)  if  we  count  operations  on  a 
conventional  computer.  That  algorithm  can  be  simulated  on  a  standard,  one- 
tape  Turing  machine  in  C?(n18)  steps. 


28.1.4  Connected  Graphs 

We  next  consider  languages  that  describe  significant  properties  of  graphs.  One  of  the 
simplest  questions  we  can  ask  about  a  graph  is  whether  it  is  connected.  A  graph  is 
connected  iff  there  exists  a  path  from  each  vertex  to  each  other  vertex.  We  consider 
here  the  problem  for  undirected  graphs.  (We  can  also  define  the  related  language  for 
directed  graphs.)  Define  the  language: 

•  CONNECTED  =  {<G>  :  G  is  an  undirected  graph  and  G  is  connected}. 

THEOREM  28.4  The  Problem  of  Identifying  Connected  Graphs  is  in  P 
Theorem:  CONNECTED  is  in  P. 

Proof:  We  prove  that  CONNECTED  is  in  P  by  exhibiting  a  deterministic,  polynomial¬ 
time  algorithm  that  decides  it: 

connect  ed(<G  :  graph  with  vertices  V  and  edges  E>)  = 

1.  Set  all  vertices  to  be  unmarked. 

2.  Mark  vertex  1. 

3.  Initialize  L  (a  list  that  will  contain  vertices  that  have  been  marked  but 
whose  successors  have  not  yet  been  examined)  to  contain  just  vertex  1. 

4.  Initialize  marked-vertices-counter  to  1. 

5.  Until  L  is  empty  do: 

5.1.  Remove  the  first  element  from  L.  Call  it  current-vertex. 

5.2.  For  each  edge  e  that  has  current-vertex  as  an  endpoint  do: 

Call  the  other  endpoint  of  e  next-vertex.  If  next-vertex  is  not  already 
marked  then  do: 

Mark  next-vertex. 

Add  next-vertex  to  L. 

Increment  marked-vertices-counter  bv  1. 

6.  If  marked-vertices-counter  =  \V\  accept;  else  reject. 

Connected  will  mark  and  count  the  vertices  that  are  reachable  from  vertex  1. 
Since  G  is  undirected,  if  there  is  a  path  from  vertex  1  to  some  vertex  /i,  then 
there  is  also  a  path  from  vertex  n  back  to  vertex  1.  So,  if  there  is  a  path  from 
vertex  1  to  every  other  vertex,  then  there  is  a  path  from  every  other  vertex 
back  to  vertex  1  and  from  there  to  each  other  vertex.  Thus  G  is  connected.  If, 
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on  the  other  hand,  there  is  some  vertex  that  is  not  reachable  from  vertex  1,  G 
is  unconnected. 

So  it  remains  to  show  that  the  runtime  of  connected  is  some  polynomial  function 
of  |<C>|: 

•  Step  1  takes  time  that  is  0(  |  V  | ). 

•  Steps  2, 3,  and  4  each  take  constant  time. 

•  The  outer  loop  of  step  5  can  be  executed  at  most  |  V  \  times  since  no  vertex  can 
be  put  on  L  more  than  once. 

•  Step  5.1  takes  constant  time. 

•  The  loop  in  step  5.2  can  be  executed  at  most  |£|  times.  Each  time  through, 
it  requires  at  most  C?(| V\)  time  (depending  on  how  the  vertices  are  repre¬ 
sented  and  marked). 

•  Step  6  takes  constant  time. 

So  the  total  time  required  to  execute  connected  is  | V'l  •  C>(  |£T|)  •  C2(|V|)  = 
0(|V'|2|£|).  But  note  that  |£|  ^  IVl2.  So  the  time  required  to  execute  connected 
is  0(1  V'l4). 


28.1.5  Eulerian  Paths  and  Circuits 

The  Seven  Bridges  of  Konigsberg  problem  9  is  inspired  by  the  geography  of  a  town 
once  called  Konigsberg,  in  Germany,  now  called  Kaliningrad,  in  Russia.  The  town 
straddled  the  banks  of  the  river  Pregel  and  there  were  two  islands  in  the  river.  There 
were  seven  bridges  connecting  the  river  banks  and  the  islands  as  shown  in  Figure  28.1. 
The  problem  is  this:  Can  a  citizen  of  Konigsberg  take  a  walk  through  the  town  (starting 
anywhere  she  likes)  and  cross  each  bridge  exactly  once? 

In  1736,  Leonhard  Euler  showed  that  the  answer  to  this  question  is  no. To  prove  this, 
he  abstracted  the  map  to  a  graph  whose  vertices  correspond  to  the  land  masses  and 
whose  edges  correspond  to  the  bridges  between  them.  So,  in  Euler's  representation, 
the  town  becames  the  graph  shown  in  Figure  28.2.  Vertices  1  and  2  represent  the  river 
banks  and  vertices  3  and  4  represent  the  two  islands. 


FIGURE  28.1  The  Seven  Bridges  of  Konigsberg. 
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FIGURE  28.2  The  Seven  Bridges 
of  Konigsberg  as  a  graph. 


We  can  now  restate  the  Seven  Bridges  of  Kdnigsberg  problem  as,  “Does  there  exist 
a  path  through  the  graph  such  that  each  edge  is  traversed  exactly  once?" 

Generalizing  to  an  arbitrary  graph,  we  give  the  following  definitions: 

•  An  Eulerian  path  through  a  graph  G  is  a  path  that  traverses  each  edge  in  G  exactly 
once. 

•  An  Eulerian  circuit  through  a  graph  G  is  a  path  that  starts  at  some  vertex  s,  ends 
back  in  s,  and  traverses  each  edge  in  G  exactly  once.  (Note  the  difference  between  an 
Eulerian  circuit  and  a  Hamiltonian  one:  An  Eulerian  circuit  visits  each  edge  exactly 
once.  A  Hamiltonian  circuit  visits  each  vertex  exactly  once.) 


Bridge  inspectors,  road  cleaners,  and  network  analysts  can  minimize  their 
effort  if  they  traverse  their  systems  by  following  an  Eulerian  circuit.  (1.2) 


We’d  now  like  to  determine  the  computational  complexity  of  deciding,  given  an 
arbitrary  graph  G,  whether  or  not  it  possesses  an  Eulerian  path  (or  circuit).  Both 
questions  can  be  answered  with  a  similar  technique,  so  we’ll  pick  the  circuit  problem 
and  define  the  following  language: 

•  EULER1AN-CIRCUIT  =  {<G>  :  G  is  an  undirected  graph  and  G  contains  an 
Eulerian  circuit}. 

We’ll  show  next  that  EULERIAN-CIRCUIT  is  in  P.  The  algorithm  that  we  will 
use  to  prove  this  claim  is  based  on  an  observation  that  Euler  made  in  studying  the 
Kdnigsberg  bridge  problem.  Define  the  degree  of  a  vertex  to  be  the  number  of  edges 
with  it  as  an  endpoint.  For  example,  in  the  Kdnigsberg  graph,  vertices  1 , 2,  and  4  have 
degree  3.  Vertex  3  has  degree  5.  Euler  observed  that: 

•  A  connected  graph  possesses  an  Eulerian  path  that  is  not  a  circuit  iff  it  contains 
exactly  two  vertices  of  odd  degree.  Those  two  vertices  will  serve  as  the  first  and  last 
vertices  of  the  path. 

•  A  connected  graph  possess  an  Eulerian  circuit  iff  all  its  vertices  have  even  degree. 
Because  each  vertex  has  even  degree,  any  path  that  enters  it  can  also  leave  it  without 
reusing  an  edge. 

It  should  now  by  obvious  why  Euler  knew  (without  explicitly  exploring  all  possible 
paths)  that  there  existed  no  path  that  crossed  each  of  the  Kdnigsberg  bridges  exactly  once. 
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THEOREM  28.5  The  Problem  of  Finding  an  Eulerian  Circuit  in  a  Graph  is  in  P 
Theorem:  EULERIAN-CIRCUIT  is  in  P. 

Proof:  We  prove  that  EULERIAN-CIRCUIT  is  in  P  by  exhibiting  a  deterministic, 
polynomial-time  algorithm  that  decides  it: 

Eulerian(<G:  graph  with  vertices  V  and  edges  E>)  = 

L  If  connected(G)  rejects,  reject  (since  an  unconnected  graph  cannot  have 
an  Eulerian  circuit).  Else: 

2.  For  each  vertex  v  in  G  do: 

2.L  Count  the  number  of  edges  that  have  v  as  one  endpoint  but  not 
both. 

12.  If  the  count  is  odd,  exit  the  loop  and  reject. 

3.  If  all  counts  are  even,  accept. 

The  correctness  of  Eulerian  follows  from  Euler’s  observations  as  stated  above. We 
show  that  Eulerian  runs  in  polynomial  time  as  follows: 

•  We  showed  in  the  proof  of  Theorem  28.4  that  connected  runs  in  time  that  is 
polynomial  in  |<V>|. 

•  The  loop  in  step  2  is  executed  at  most  |V|  times.  Each  time  through,  it  requires 
time  that  is  <9(| El). 

•  Step  3  takes  constant  time. 

So  the  total  time  required  to  execute  steps  2  through  3  of  Eulerian  is  |  V|  •  <9(|  E| ). 
But  |E|  ^  IVl2.  So  the  time  required  to  execute  steps  2-3  of  Eulerian  is  0(|V|3). 


28.1.6  Minimum  Spanning  Trees* 

Consider  an  arbitrary  undirected  graph  G.  A  spanning  tree  T  of  G  is  a  subset  of  the 
edges  of  G  such  that: 

■  •  '  •  t  '  .  ,  ♦  ,  *1 

•  T  contains  no  cycles  and 

•  Every  vertex  in  G  is  connected  to  every  other  vertex  using  just  the  edges  in  T. 

An  unconnected  graph  (i.e.,a  graph  in  which  there  exist  at  least  two  vertices  with  no 
path  between  them)  has  no  spanning  trees.  A  connected  graph  G  will  have  at  least  one 
spanning  tree;  it  may  have  many. 

Define  a  weighted  graph  to  be  a  graph  that  has  a  weight  (a  number)  associated  with 
each  edge.  Typically  the  weight  represents  some  sort  of  cost  or  benefit  associated  with 
traversing  the  edge.  Define  an  unweighted  graph  to  be  a  graph  that  does  not  associate 
weights  with  its  edges. 

If  G  is  a  weighted  graph,  we  can  compare  the  spanning  trees  of  G  by  defining  the  cost 
of  a  tree  to  be  the  sum  of  the  costs  (weights)  of  its  edges.  Then  a  tree  T  is  a  minimum 
spanning  tree  of  G  iff  it  is  a  spanning  tree  and  there  is  no  other  spanning  tree  whose  cost 
is  lower  than  that  of  T.  Note  that,  if  all  edge  costs  are  positive,  a  minimum  spanning  tree 
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is  also  a  minimum  cost  subgraph  that  connects  all  the  vertices  of  G  since  any  connected 
subgraph  that  contains  cycles  (i.e.,  any  connected  subgraph  that  is  not  a  tree)  must  have 
higher  cost  than  T  does. 


The  cheapest  way  to  lay  cable  that  connects  a  set  of  points  is  along  a  mini¬ 
mum  spanning  tree  that  connects  those  points.  (1.2) 


EXAMPLE  28.1  A  Minimum  Spanning  Tree 


Let  G  be  the  following  graph,  in  which  the  edge  costs  are  shown  in  parentheses 
next  to  each  edge: 


a  .llljil  »  lil  U  i .  L 1.  r4  U.L.  1. 


Given  a  connected  graph  G,  how  shall  we  go  about  trying  to  find  a  minimum  span¬ 
ning  tree  for  it?  The  most  obvious  thing  to  do  is  to  try  all  subgraphs  of  G.  We  can  reject 
any  that  do  not  connect  all  of  G’s  vertices.  Of  the  remaining  ones,  we  can  choose  the 
one  with  the  lowest  total  cost.  This  procedure  works  but  does  not  run  in  time  that  is 
polynomial  in  the  size  of  G.  Can  we  do  better?  The  answer  is  yes. 

One  of  the  simplest  reasonable  algorithms  for  finding  a  minimum  spanning  tree  is 
Kruskal’s  algorithm,  defined  as  follows: 

Kruskal(G:  connected  graph  with  vertices  V  and  edges  £)  = 

L  Sort  the  edges  in  E  in  ascending  order  by  their  cost.  Break  ties  arbitrarily. 

2.  Initialize  T  to  a  forest  with  an  empty  set  of  edges. 

3.  Until  all  the  edges  in  E  have  been  considered  do: 

3.1.  Select  e ,  the  next  edge  in  E.  If  the  endpoints  of  e  are  not  connected  in  T 
then  add  e  to  T 

4.  Return  T. 
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To  show  that  Kruskal’s  algorithm  finds  a  minimum  spanning  tree,  we  must  show  that 
the  graph  Tthat  it  returns  is  a  tree  (i.e..  it  is  connected  and  it  contains  no  cycles),  that  it 
is  a  spanning  tree  (i.e.,  that  it  includes  all  the  vertices  of  the  original  graph  G),  and  that 
there  is  no  lower  cost  spanning  tree  of  G.  T  cannot  contain  cycles  because  step  3.1  can 
add  an  edge  only  if  its  two  endpoints  are  not  already  connected.  T  must  be  connected 
and  it  must  be  a  spanning  tree  because  we  assumed  that  the  input  graph  G  is  connected. 
This  means  that  if  we  used  all  of  G's  edges,  every  one  of  Gs  vertices  would  be  in  T  and 
T  would  be  connected.  But  we  do  use  all  of  G's  edges  except  ones  whose  endpoints  are 
already  connected  in  T. 

So  all  that  remains  is  to  prove  that  we  have  found  a  minimum  spanning  tree. 
Kruskal’s  algorithm  is  an  example  of  a  greedy  algorithm:  It  attempts  to  find  an  optimal 
global  solution  by  grabbing  the  best  (local)  pieces,  in  this  case  short  edges,  and  putting 
them  together.  Greedy  algorithms  tend  to  run  quickly  because  they  may  do  little  or  no 
search.  But  they  cannot  always  be  guaranteed  to  find  the  best  global  solution.  For  exam¬ 
ple,  there  exist  greedy  algorithms  for  solving  the  traveling  salesman  problem.  Although 
they  may  produce  fairly  reasonable  solutions  quickly,  they  cannot  be  guaranteed  to  find 
a  shortest  path. 

It  turns  out.  however,  that  Kruskal's  algorithm  is  guaranteed  to  find  a  minimum 
spanning  tree.  To  see  why.  we  make  the  following  observation,  which  holds  for  any 
graph  G  with  a  single  minimum  spanning  tree.  It  can  be  extended,  with  a  bit  more 
complexity,  to  graphs  that  have  multiple  spanning  trees.  Suppose  that  Kruskal  gener¬ 
ated  a  tree  T K  that  is  not  a  minimal  spaning  tree.  Then  there  was  the  first  point  at 
which  it  inserted  an  edge  (n,  m)  that  prevented  T A  from  being  the  same  as  some  min¬ 
imal  spanning  tree.  Pick  a  minimum  spanning  tree  that  is  identical  to  TK  up  to,  but 
not  including  that  point,  and  call  it  Tmin.  Because  Tm,„  is  a  spanning  tree,  it  must  con¬ 
tain  exactly  one  path  between  n  and  m.Thal  path  does  not  contain  the  edge  (n,  m). 
Suppose  that  we  add  it.  T,ni„  now  contains  a  cycle.  That  cycle  must  contain  some  edge 
e  that  Kruskal  would  have  considered  after  considering  (/i.  m)  (since  otherwise  it 
would  have  been  chosen  instead  of  (w.  ni)  as  a  way  to  connect  n  and  mi).  Thus  the 
weight  of  that  edge  must  be  at  least  the  weight  of  (/i,  m).  Remove  c*  from  Tmin.  Call 
the  result  Tmi„'  is  a  spanning  tree.  It  contains  the  edge  («.  in)  instead  of  the 
edge  e.  Since  the  weight  of  ( n ,  m)  is  less  than  or  equal  to  the  weight  of  e,  the  weight 
Of  Tmin  must  be  less  than  or  equal  to  the  weight  of  T But  we  assumed  that 
was  minimal,  so  it  can't  be  less.  It  must,  therefore  be  equal.  But  then  adding  («,  m) 
did  not  prevent  T K,  the  tree  that  Kruskal  built,  from  being  minimal. This  contradicts 
the  assumption  that  it  did. 

We  are  now  ready  to  ask  the  question.  “How  computationally  hard  is  finding  a 
minimum  spanning  tree?"  Since  this  is  an  optimization  problem,  we'll  use  the  same 
technique  we  used  to  convert  the  traveling  salesman  problem  into  the  decision  prob¬ 
lem  TSP-DEC1DE:  We’ll  give  a  cost  bound  and  ask  whether  there  exists  a  minimum 
spanning  tree  whose  cost  is  less  than  the  bound  we  provide. 

Define  the  language: 

MST  =  {<G.  cost>  :G  is  an  undirected  graph  with  a  positive  cost  attached  to 
each  of  its  edges  and  there  exists  a  minimum  spanning  tree  of  G  with  total  cost 
less  than  covr}. 
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THEOREM  28.6  The  Problem  of  Finding  a  Minimum  Spanning  Tree 

with  an  Acceptable  Cost  is  inP _ 

Theorem:  MSTisinP. 

Proof:  We  prove  that  MST  is  in  P  by  exhibiting  a  polynomial  time  algorithm  that 
decides  it: 

MSTdecide(<G :  graph  with  vertices  V  and  edges  E,  cost:  number>)  = 

1.  Invoice  Kruskal(C).  Let  T  be  the  minimum  spanning  tree  that  is  returned. 

2.  If  the  total  cost  of  T  <  cost  then  accept,  else  reject. 

MSTdecide  runs  in  polynomial  time  if  each  of  its  two  steps  does.  Step  2  can  be 
done  in  constant  time. 

So  it  remains  to  analyze  Kruskal’s  algorithm,  which  we  do  as  follows: 

•  Step  1,  the  sorting  step,  can  be  done  with  |E|  •  log  |E|  comparisons  and  each 
comparison  takes  constant  time. 

•  Step  2  takes  constant  time. 

•  The  loop  in  step  3  is  executed  |E|  times  The  time  required  at  each  step  to  test 
whether  two  vertices  are  disconnected  depends  on  the  data  structure  that  is  used 
to  represent  T.  A  straightforward  way  to  do  it  is  to  maintain,  for  each  tree  in  the 
forest  T,  a  set  that  contains  exactly  the  vertices  that  are  present  in  that  tree.  Each 
vertex  in  V  will  be  in  at  most  one  such  set.  So,  in  considering  an  edge  (n,  m),  we 
examine  each  of  the  sets.  If  we  find  one  that  contains  n,  we  look  just  in  that  set  to 
see  whether  m  is  also  there.  If  it  is,  then  n  and  m  are  already  connected;  otherwise 
they  are  not.  To  find  n,  we  may  have  to  look  through  all  the  sets,  so  we  may  have 
to  examine  |  V|  vertices  and,  if  all  the  vertices  are  in  the  same  set,  we  might  have 
to  do  that  again  to  look  for  m.  So  we  might  examine  0(1  V|)  vertices  to  do  the 
check  for  disconnectedness.  Further,  if  we  take  this  approach,  then  we  must  main¬ 
tain  these  sets.  But,  even  doing  that,  the  cost  of  adding  e  to  T  is  constant.  So  step 
3  takes  a  total  number  of  steps  that  is  0(|£|  •  |V|). 

So  the  total  time  required  to  execute  Kruskal's  algorithm  is  0(|£l  •  |V|)  and  so 
C?(|<G>|2).  With  a  more  efficient  implementation  of  step  3,  it  is  possible  to  show13 
that  it  is  alsoO(|£|  •loglV'l). 

Kruskal's  algorithm  proves  that  MST  is  in  P.  And  it  is  very  easy  to  implement. 
There  also  exist  other  algorithms  for  finding  minimum  spanning  trees  that  run 
even  faster  than  Kruskal's  algorithm  does  B. 


.1.7  Primality  Testing 

Prime  numbers  B  have  fascinated  mathematicians  since  the  time  of  the  ancient 
Greeks.  It  turns  out  that  some  key  problems  involving  prime  numbers  are  known  to  be 
solvable  in  polynomial  time,  while  some  others  are  not  now  known  to  be. 


,?For  a  proof  of  this  claim,  see  [Corman  et  al.  2001]. 
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Prime  numbers  fire  of  more  than  theoretical  interest  They  play  a  critical  role 
in  modern  encryption  systems.  (J.  3) 


In  Example  27.6.  we  introduced  the  language: 

•  RELATIVELY-PRIME  =  { <«.  /«>  :  n  and  in  are  integers  and  they  are  relatively 
prime}.  Recall  that  two  integers  are  relatively  prime  ilf  their  greatest  common 
divisor  is  1. 


THEOREM  28.7  RELATIVELY-PRIME  is  in  P  _ 

Theorem:  RELATIVELY-PRIME  is  in  P. 

Proof:  RELATIVELY-PRIME  can  be  decided  in  linear  time  by  the  algorithm 
REL-PRIMEtleciile, described  in  Example  27.6. 


But  now  consider  the  problem  of  determining  whether  or  not  a  number  is  prime.  We 
have  encoded  that  problem  as  the  language: 

•  PRIMES  =  { iv :  w  is  the  binary  encoding  of  a  prime  number} . 

The  obvious  way  to  decide  PRIMES  is.  when  given  the  number  k.  to  consider  all  the 
natural  numbers  between  2  and  Vk.  Check  each  to  see  whether  it  divides  evenly  into 
k.  If  any  such  number  docs,  then  k  isn't  prime.  If  none  does,  then  k  is  prime.  The  time 
required  to  implement  this  approach  is  0(  Vk ).  But  n.  the  length  of  the  string  that  en¬ 
codes  k ,  is  log  k.  So  this  simple  algorithm  is  (P(2" -).  Because  of  the  practical  signifi¬ 
cance  of  primality  testing,  particularly  in  cryptography,  substantial  effort  has  been 
devoted  to  finding  a  more  efficient  technique  for  deciding  PRIMES.  It  turns  out  that 
there  exist  randomized  algorithms  that  can  decide  PRIMES  in  polynomial  time  if  we 
allow  an  exceedingly  small,  hut  nonzero,  probability  of  making  an  error.  We’ll  describe 
such  an  approach  in  Chapter  30.  Such  techniques  arc  widely  used  in  practice. 

Until  very  recently,  however,  the  question  of  whether  PRIMES  is  in  P  (i.c.,  whether 
a  provably  correct,  polynomial-time  algorithm  for  it  exists)  remained  unanswered  and 
it  continued  to  be  of  theoretical  interest.  We  now  know  the  answer  to  the  question.  We 
can  stale  it  as  the  following  theorem: 

THEOREM  28.8  PRIMES  is  in  P 

Theorem:  PRIMES  is  in  P. 

Proof:  Various  proofs  of  this  claim  have  been  proposed.  Most  have  relied  on 
hypotheses  that,  although  widely  believed  to  be  true,  remained  unproven.  But 
(Agrawal,  Kayal  and  Saxena  2004}  contains  a  proof  that  relies  on  no  unproven 
assumptions.  It  describes  an  algorithm  for  deciding  PRIMES  that  runs  in 
deterministic  (P((log  ;i),2*/(log(log  n)))  lime,  where  f  is  a  polynomial. The  details 
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of  the  proof  arc  beyond  the  scope  of  this  book.  Since  the  original  algorithm  was 
described,  modifications  of  it  that  further  improve  its  performance  have  been 
discovered. 


The  class  P  is  closed  under  complement.  So  we  also  have  that  the  following  language 
is  in  P: 

COMPOSITES  =  {w :  w  is  the  binary  encoding  of  a  composite  number).  A 
composite  number  is  a  natural  number  greater  than  1  that  is  not  prime. 

Unfortunately,  the  results  we  have  just  presented  do  not  close  the  book  on  the  prob¬ 
lem  of  working  with  prime  and  composite  numbers.  We  now  know  that  there  exists  a 
polynomial-time  algorithm  to  check  whether  a  number  is  prime  and  we  continue  to 
exploit  randomized  algorithms  to  answer  the  question  in  practice.The  fact  that  we  can.  in 
polynomial  time,  tell  whether  or  not  a  number  is  prime  does  not  tell  us  that  there  exists  a 
polynomial-time  algorithm  to  discover  the  factors  of  a  number  that  is  not  prime.  No  effi¬ 
cient  algorithm  for  factoring  using  a  conventional  computer  is  currently  known.  Were  a 
practical  and  efficient  algorithm  to  be  discovered,  modern  encryption  techniques  that 
rely  on  factorization  would  no  longer  be  effective.  One  approach  to  constructing  such  an 
algorithm  is  to  exploit  quantum  computing.  Shor's  algorithm  P..  for  example,  factors  a 
number  k  in  C>((log  k)y)  lime  on  a  quantum  computer.  But  the  largest  number  that  has 
so  fat  been  able  to  be  factored  on  a  quantum  computer  is  15  P. 


28.2  The  Language  Class  NP 

Now  suppose  that,  in  our  quest  for  polynomial-time  deciding  Turing  machines,  we 
allow  nondelerminism.  Will  this  increase  the  number  of  languages  for  which  it  is  possi¬ 
ble  to  build  polynomial-time  deciders?  No  one  knows.  But  it  appears  likely  that  it  does. 
For  example,  consider  again  the  traveling  salesman  language  TSP-DEC1DE  =  {«’  of 
the  form:  <G.  cost>,  where  <G>  encodes  an  undirected  graph  with  a  positive  dis¬ 
tance  attached  to  each  of  its  edges  and  G  contains  a  Hamiltonian  circuit  whose  total 
cost  is  less  than  cost }.  Recall  that  a  Hamiltonian  circuit  is  a  path  that  starts  at  some 
vertex  .v,  ends  back  in  .v,  and  visits  each  other  vertex  in  G  exactly  once.  We  know  of  no 
deterministic  Turing  machine  that  can  decide  TSP-DEC1DE  in  polynomial  time.  But 
there  is  a  nnndeterministicTuring  machine  that  does.  It  works  by  using  nondeterminism 
to  guess  the  best  path.  (We'll  describe  it  in  detail  below.) 

28.2.1  Defining  the  Class  NP 

TSP-DF.CIDE  is  typical  of  a  large  class  of  problems  that  are  of  considerable  practical 
interest.  All  of  them  share  the  following  three  properties: 

I.  ITie  problem  can  be  solved  by  searching  through  a  space  of  partial  solutions 
(such  as  routes),  looking  for  a  complete  solution  that  satisfies  all  of  the  given  con¬ 
straints.  The  size  ot  the  space  that  must  be  explored  in  this  way  grows  exponen¬ 
tially  with  the  size  of  the  problem  that  is  being  considered. 
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2.  No  better  (i.e.,  not  based  on  search)  technique  for  finding  an  exact  solution  is 
known. 

3.  But.  if  a  proposed  solution  were  suddenly  to  appear,  it  could  be  checked  for  cor¬ 
rectness  very  efficiently. 

The  next  language  class  that  we  will  define  is  called  NP.  It  will  include  TSP  DECIDE 
and  its  cousins,  as  well  as  the  “easier"  languages  that  are  also  in  P.  In  Section  28.5.1.  we’ll 
define  a  subset  of  NP  called  NP-complete.  It  will  include  only  those  languages,  like  TSP- 
DECIDE,  that  are  the  “hardest”  of  the  NP  languages. 

Properties  1  and  3  suggest  two  superficially  quite  different  ways  to  define  NP.  It 
turns  out  that  the  two  definitions  are  equivalent.  Because  each  of  them  is  useful  in 
some  contexts,  we  provide  them  both. 

Nondeterministic  Deciding 

The  first  definition  we  present  is  based  on  the  idea  of  search.  Nondeterministic  TUring 
machines  perform  search.  So  we  will  define  the  class  NP  to  include  all  and  only  those 
languages  that  are  decidable  by  a  nondeterministic  Turing  machine  in  polynomial  time. 
(The  name  NP  stands  for  Nondeterministic  Polynomial.)  Remember  that  in  defining 
the  time  requirement  of  a  nondeterministic  Turing  machine  M,  wc  don't  consider  the 
total  number  of  steps  that  M  executes  on  all  of  its  computational  paths;  instead  we 
measure  just  the  length  of  its  longest  path. Thus  we'll  say  that  a  language  L  is  in  NP  iff 
there  is  some  nondeterministic  Turing  machine  M  that  decides  L  and  the  length  of  the 
longest  computational  path  that  M  must  follow  on  any  input  of  length  n  grows  as  some 
polynomial  function  of  n.  So  we  have: 

The  Class  NP:  L  e  NP  iff  there  exists  some  nondeterministic  Turing  machine  M 

that  decides  L  and  timcreq(M)  e  0(nk)  for  some  constant  k. 

Again  consider  the  language  TSP-DECIDE.  Given  a  string  w  =  <G,cost>,  we 
can  build  a  nondeterministic  Turing  machine  M  that  decides  whether  w  is  in  TSP- 
DECIDE.  M's  job  is  to  decide  whether  there  is  a  Hamiltonian  circuit  through  G 
whose  cost  is  less  than  cost.  M  will  nondeierministicaily  guess  a  path  through  G  with 
length  equal  to  the  number  of  vertices  in  G.  There  is  a  finite  number  of  such  paths  and 
each  of  them  has  finite  length,  so  all  paths  of  M  will  eventually  halt.  M  will  accept  w  if 
it  finds  at  least  one  path  that  corresponds  to  a  Hamiltonian  circuit  whose  cost  is  less 
than  cost.  Otherwise  it  will  reject.  We  will  show  below  that  timereq{M)e.O(n).  So 
TSP-DECIDE  e  NP. 

Deterministic  Verifying 

But  now  suppose  that  (somehow)  we  find  ourselves  in  possession  of  a  particular  path 
c,  along  with  the  claim  that  c  proves  that  w  =  <G.  cost>  is  in  TSP-DECIDE.  In  other 
words,  there  is  a  claim  that  c  is  a  Hamiltonian  circuit  through  G  with  cost  less  than 
cost.  Our  only  job  now  is  to  check  c  and  verify  that  it  does  in  fact  prove  that  w  is  in 
TSP-DECIDE.  We  can  build  a  deterministic  Turing  machine  M’  that  does  this  in  time 
that  is  polynomial  in  the  length  of  u>.The  input  to  M‘  will  be  the  string  (<G.  cost,  c>). 
Assume  that  c  is  represented  as  a  sequence  of  vertices.  Then  M'  will  simply  walk 
through  c,one  vertex  at  a  time. checking  that  G  does  in  fact  contain  the  required  edge 
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from  each  vertex  to  its  successor.  As  it  does  this,  it  will  keep  track  of  the  length  of  c  so 
far.  If  that  length  ever  exceeds  the  number  of  vertices  in  G,  M'  will  halt  and  reject. 
Also,  as  it  goes  along,  it  will  use  a  list  of  the  vertices  in  G  and  mark  each  one  as  it  is 
visited,  checking  to  make  sure  that  every  vertex  is  visited  exactly  once.  And,  finally,  it 
will  keep  a  running  total  of  the  costs  of  the  edges  that  it  follows.  If  every  step  of  c  fol¬ 
lows  an  edge  in  G,  every  vertex  in  G  is  visited  once  except  that  c  starts  and  ends  at  the 
same  vertex,  and  the  cost  of  c  is  less  than  cost ,  M'  will  accept  its  input  and  thus  report 
that  c  does  in  fact  prove  that  w  is  in  TSP-DECIDE.  Otherwise  M '  will  reject  its  input 
and  thus  report  that  c  fails  to  do  that.  (But  note  that  this  doesn’t  mean  that  w  is  not  in 
TSP-DECIDE;  we  know  only  that  c  fails  to  show  that  it  is.)  We’ll  call  M'  a  verifier  for 
TSP-DECIDE.  We  can  analyze  the  complexity  of  M  as  follows:  Let  n  be  the  number 
of  vertices  in  G.  Then  M'  executes  its  outer  loop  at  most  n  times  (since  it  will  quit  if 
the  length  of  c  exceeds  n).  As  it  checks  each  step  in  c,  it  may  take  C?(|<G>|2)  steps  to 
check  the  edges  of  G,  to  mark  the  visited  vertices,  and  to  update  the  cost  total.  So  the 
total  time  for  the  main  loop  is  0(|<G>|3).  The  final  check  takes  time  that  is 
G(|<G>I2).  So  the  total  is  0(|<G>|3). 

So  we  have  M ,  a  nondeterministic  polynomial  time  decider  for  TSP-DECIDE. 
And  we  have  M',  a  deterministic  polynomial  time  verifier  for  it.  The  relationship 
between  M  and  M'  is  typical  for  problems  like  TSP-DECIDE.  The  decider  works  by 
nondcterministically  searching  a  space  of  candidate  structures  (in  the  case  of  TSP- 
DECIDE,  candidate  paths)  and  accepting  iff  it  finds  at  least  one  that  meets  the 
requirements  imposed  by  the  language  that  is  being  decided.  The  verifier  works  by 
simply  checking  a  single  candidate  structure  (e.g.,  a  path)  and  verifying  that  it  meets 
the  language's  requirements. 

The  existence  of  verifiers  like  Af '  suggests  an  alternative  way  to  define  the  class  NP. 
We  first  define  exactly  what  we  mean  by  a  verifier;  A  Turing  machine  V  is  a  verifier  for 
a  language  L  iff; 


w  e  L  iff  3c  (<u>,  c>  e  L(V)). 

We'll  call  c.  the  candidate  structure  that  we  provide  to  the  verifier  V,  a  certificate. 
Think  of  it  as  a  certificate  of  proof  that  w  is  in  L.  So  V  verifies  L  precisely  in  case  it 
accepts  at  least  one  certificate  for  every  string  in  L  and  accepts  no  certificate  for  any 
string  that  is  not  in  L.  Since  the  string  we  are  actually  interested  in  is  w,  we  will  define 
timereq(V).  when  V  is  a  verifier,  as  a  function  just  of  |u>|,  not  of  |<u>,  c> |. 

Now,  using  the  idea  of  a  verifier,  we  can  state  the  following  alternative  definition  for 
the  class  NP: 

The  Class  NP:  L  e  NP  iff  there  exists  a  deterministic Tbring  machine  V  such  that 

V  is  a  verifier  for  L  and  timereq^eO^)  for  some  constant  k  (i.e.,  V  is  a  de¬ 
terministic  polynomial-time  verifier  for  L). 

Note  that,  since  the  number  of  steps  that  a  polynomial-time  V  executes  is  bounded 
by  some  polynomial  function  of  the  length  of  the  input  string  to,  the  number  of  certifi¬ 
cate  characters  it  can  look  at  is  also  bounded  by  the  same  function.  So,  when  we  are 
considering  polynomial  time  verifiers,  we  will  consider  only  certificates  whose  length  is 
bounded  by  some  polynomial  function  of  the  length  of  the  input  string  w. 
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The  Two  Definitions  are  Equivalent 

Now  that  we  have  two  definitions  for  the  class  NP.  we  would  like  to  be  able  use 
whichever  one  is  more  convenient.  However,  it  is  not  obvious  that  the  two  definitions 
are  equivalent.  So  we  must  prove  that  they  are. 

THEOREM  28.9  The  Two  Definitions  of  the  Class  NP  are  Equivalent 

Theorem:  The  following  two  definitions  are  equivalent: 

1.  L  e  NP  iff  there  exists  a  nondeterministic,  polynomial-time  Turing  machine 
that  decides  it. 

2.  Le.  NP  iff  there  exists  a  deterministic,  polynomial-time  verifier  for  it. 

Proof:  We  must  prove  that  if  there  exists  a  nondeterministic  decider  for  L  then 
there  exists  a  deterministic  verifier  for  it,  and  vice  versa: 

1.  Let  L  be  a  language  that  is  in  NP  by  definition  1  .Then  there  exists  a  nondeter¬ 
ministic,  polynomial-time  Turing  machine  M  that  decides  it.  Using  M,  we  con¬ 
struct  V,  a  deterministic  polynomial  time  verifier  for  L.  On  the  input  <w,  c> , 

V  will  simulate  M  running  on  w  except  that,  every  time  M  would  have  to  make 
a  choice,  V  will  simply  follow  the  path  that  corresponds  to  the  next  symbol  of 
c.  V  will  accept  iff  M  would  have  accepted  on  that  path. Thus  V  will  accept  iff  c 
is  a  certificate  for  w.  V  runs  in  polynomial  time  because  the  length  of  the 
longest  path  M  can  follow  is  bounded  by  some  polynomial  function  of  the 
length  of  w.  So  V  is  a  deterministic  polynomial-time  verifier  for  L. 

2.  Let  L  be  a  language  that  is  in  NP  by  definition  2.  Then  there  exists  a 
deterministic  Turing  machine  V  such  that  V  is  a  verifier  for  L  and 
timereq(V)eO(nk )  for  some  k.  Using  V,  we  construct  a  nondeterministic 
polynomial-time  Turing  machine  M  that  will  decide  L.  On  input  w,  M  will 
nondeterministically  select  a  certificate  c  whose  length  is  bounded  by  the 
greatest  number  of  steps  V  could  execute  on  any  input  of  length  at  most 
timereq{V)(\w\).  (It  need  not  consider  any  longer  certificates  since  v 
would  not  be  able  to  evaluate  them.)  It  will  then  run  V  on  <  w,  c>.  M  fol¬ 
lows  a  finite  number  of  computational  paths,  each  of  which  halts  in  time 
that  is  0(nk).  So  M  is  a  nondeterministic  polynomial-time  Turing  machine 
that  decides  L. 

In  the  next  section  we  will  see  several  examples  of  languages  that  are  in  NP. 
Theorem  28.9  tells  us  that  we  can  prove  a  claim  of  the  form,“L  is  in  NP"  by  exhibiting 
for  L  either  a  nondeterministic  polynomial-time  decider  or  a  deterministic  polynomial¬ 
time  verifier. 


28.2.2  Languages  That  Are  in  NP 

The  class  NP  is  important  because  it  contains  many  languages  that  arise  naturally  in  a 
variety  of  applications.  We’ll  mention  several  here.  None  of  these  languages  is  known 
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to  be  in  P.  In  fact,  all  of  them  are  in  the  complexity  class  NP-complete,  which  contains 
the  hardest  NP  languages.  We'll  define  NP-completeness  in  Section  28.5. 

TSP-DECIDE  is  typical  of  a  large  class  of  graph-based  languages  that  are  in  NP. 
This  class  includes: 

•  HAMILTONIAN-PATH  =  {<G>  :  G  is  an  undirected  graph  and  G  contains  a 
Hamiltonian  path}.  A  Hamiltonian  path  through  G  is  a  path  that  visits  each  vertex 
in  G  exactly  once. 

•  HAMILTONI AN-CIRCUIT  =  {<G> :  G  is  an  undirected  graph  and  G  contains 
a  Hamiltonian  circuit}.  A  Hamiltonian  circuit  is  a  path  that  starts  at  some  vertex s, 
ends  back  in  s,  and  visits  each  other  vertex  in  G  exactly  once. 

•  CLIQUE  =  {<G,  k>  :  G  is  an  undirected  graph  with  vertices  V  and  edges  E ,  k  is 
an  integer,  1  s  k  ^  |V|,  and  G  contains  a  fc -clique}.  A  clique  in  G  is  a  subset  of  V 
with  the  property  that  every  pair  of  vertices  in  the  clique  is  connected  by  some  edge 
in  E.  A  it -clique  is  a  clique  that  contains  exactly  k  vertices. 

NP  includes  other  kinds  of  languages  as  well. Typically  a  language  is  in  NP  if  you  can 
imagine  deciding  it  by  exploring  a  well-defined  search  space  looking  for  at  least  one 
value  that  meets  some  clear  requirement.  So,  for  example,  the  following  language 
based  on  an  important  property  of  Boolean  logic  formulas  is  in  NP: 

•  SAT  =  {<w>  :  w  is  a  wff  in  Boolean  logic  and  w  is  satisfiable}.  We  can  show  that 
a  string  w  is  in  SAT  by  finding  a  satisfying  assignment  of  values  to  the  variables  in 
the  wff  that  it  encodes. 

Sets  of  almost  any  type  can  lead  to  problems  that  are  in  NP.  We’ve  just  mentioned 
examples  based  on  graphs  (sets  of  vertices  and  edges)  and  on  logical  wffs  (sets  of  vari¬ 
ables  connected  by  operators). The  following  NP  language  is  based  on  sets  of  integers: 

■  SUBSET-SUM  =  {<5,  k>  :  S  is  a  multiset  (i.e.,  duplicates  are  allowed)  of  inte¬ 
gers.  k  is  an  integer,  and  there  exists  some  subset  of  S  whose  elements  sum  to  k}. 
For  example: 

•  < { 1256, 45, 1256, 59, 34687, 8946, 17664} ,  35988>  is  in  SUBSET-SUM. 

•  <{  101, 789, 5783, 6666, 45789, 996},  29876>  is  not  in  SUBSET-SUM. 


The  SUBSET-SUM  problem  can  be  used  as  the  basis  for  a  simple  encryption 
system  that  could  be  used,  for  example,  to  store  password  files.  We  start  with 
a  set  of  say  1000  integers.  Call  them  the  base  integers.Then  suppose  that  each 
password  can  be  converted  (for  example  by  looking  at  pairs  of  symbols)  to  a 
multiset  of  base  integers.  Then  a  password  checker  need  not  store  actual 
passwords.  It  can  simply  store  the  sum  of  the  base  integers  that  the  password 
generates  When  a  user  enters  a  password.it  is  converted  to  base  integers  and 
the  sum  is  computed  and  checked  against  the  stored  sum.  But  if  hackers 
break  in  and  get  access  to  the  stored  password  sums,  they  won't  be  able  to  re¬ 
construct  any  of  the  passwords,  even  if  they  know  how  passwords  are 
mapped  to  base  integers,  unless  they  can  (reasonably  efficiently)  take  a  sum 
and  find  a  subset  of  the  base  integers  that  add  to  form  it. 
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The  next  example  of  an  NP  language  is  based  on  sets  of  anything  as  long  as  the 
objects  have  associated  costs: 

•  SET-PARTITION  =  { <S>  :  5  is  a  multiset  (i.e..  duplicates  are  allowed)  of  objects, 
each  of  which  has  an  associated  cost,  and  there  exists  a  way  to  divide  5  into  two  sub¬ 
sets,  A  and  S-A ,  such  that  the  sum  of  the  costs  of  the  elements  in  A  equals  the  sum  of 
the  costs  of  the  elements  in  S-A ). 


SET-PARTITION  arises  in  many  sorts  of  resource  allocation  contexts.  For 
example,  suppose  that  there  are  two  production  lines  and  a  set  of  objects  that 
need  to  be  manufactured  as  quickly  as  possible.  Let  the  objects’  costs  be  the 
time  required  to  make  them.  Then  the  optimum  schedule  divides  the  work 
evenly  across  the  two  machines.  Load  balancing  in  a  dual  processor  comput¬ 
er  system  can  also  be  described  as  a  set-partition  problem. 


Our  final  example  is  based  on  sets  of  anything  as  long  as  the  objects  have  associated 
costs  and  values: 

•  KNAPSACK  =  {<5,  v,  c>  :  S  is  a  set  of  objects  each  of  which  has  an  associated 
cost  and  an  associated  value,  v  and  c  are  integers,  and  there  exists  some  way  of 
choosing  elements  of  S  (duplicates  allowed)  such  that  the  total  cost  of  the  chosen 
objects  is  at  most  c  and  their  total  value  is  at  least  v}.  Notice  that,  if  the  cost  of  each 
item  equals  its  value,  then  the  KNAPSACK  problem  becomes  very  similar  to  the 
SUBSET-SUM  problem. 


The  KNAPSACK  problem  derives  its  name  from  the  problem  of  choosing 
the  best  way  to  pack  a  knapsack  with  limited  capacity  in  such  as  way  as  to 
maximize  the  utility  of  the  contents.  For  example,  imagine  a  thief  trying  to 
decide  what  to  steal  or  a  backpacker  trying  to  decide  what  food  to  take.  The 
KNAPSACK  problem  arises  in  a  wide  variety  of  applications  in  which  re¬ 
sources  are  limited  and  utility  must  be  maximized.  For  example,  what  ads 
should  a  company  buy?  What  products  should  a  factory  make?  How  should 
a  company  expand  its  workforce? 


In  the  next  three  sections  we'll  prove  that  TSP-DECIDE,  CLIQUE  and  SAT  are  in 
NP.  We'll  consider  HAMILTON!  AN-CIRCUIT  in  Theorem  28.22.  We  leave  the  rest  as 
exercises. 

28.2.3  TSP 

We  argued  above  that  there  exists  a  nondeterministic.  polvnomial-time Turing  machine 
that  decides  TSP-DECIDE.  Now  we  prove  that  claim.  Just  to  make  it  clear  how  such  a 
machine  might  work,  we  will  describe  in  detail  the  Turing  machine  TSPdecide.  Let  V  be 
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the  vertices  in  G  and  E  be  its  edges.  TSPdecide  will  nondeterministically  consider  all 
paths  through  G  with  length  equal  to  IV'I.  There  is  a  finite  number  of  such  paths  and 
each  of  them  has  finite  length,  so  all  paths  of  TSPdecide  will  eventually  halt.  TSPdecide 
will  accept  w  if  it  finds  at  least  one  path  that  corresponds  to  a  Hamiltonian  circuit  and 
that  has  cost  less  than  cost.  Otherwise  it  will  reject.  TSPdecide  will  use  three  tapes.  The 
first  will  store  the  input  G.The  second  will  keep  track  of  the  path  that  is  being  built.  And 
the  third  will  contain  the  total  cost  of  the  path  so  far.  We  define  TSPdecide  as  follows: 

TSPdecide(<G :  graph  with  vertices  V  and  edges  E,  cost.  integer>)  = 

1.  Initialize  by  nondeterministically  choosing  a  vertex  in  G.  Put  that  vertex  on 
the  path  that  is  stored  on  tape  2.  Write  0  on  tape  3. 

2.  Until  the  number  of  vertices  on  the  path  on  tape  2  is  equal  to  IV'I  +  1  or  this 
path  fails  do: 

2.1.  Nondeterministically  choose  an  edge  e  in  E. 

2.2.  Check  that  one  endpoint  of  e  is  the  last  vertex  on  the  current  path, 

2.3.  Check  that  either: 

•  the  number  of  vertices  on  the  path  equals  |  V|  and  the  other  endpoint 
of  e  is  the  same  as  the  first  vertex  in  the  path,  or 

•  the  number  of  vertices  on  the  path  is  less  than  IV'I  and  the  other  end¬ 
point  of  e  is  not  on  already  on  the  path. 

2.4.  Add  the  cost  of  e  to  the  path  cost  that  is  stored  on  tape  3  and  check  that 
the  result  is  less  than  cost. 

2^.  If  conditions  2.2, 2.3,  and  2.4,  are  satisfied  then  add  the  second  endpoint 
of  e  to  the  current  path. 

2.6.  Else  this  path  fails.  Exit  the  loop. 

3.  If  the  loop  ended  normally,  accept.  If  it  ended  by  the  path  failing,  reject. 

We  analyze  timereq{TSPdecide)  as  follows:  The  initialization  in  step  1  takes 
G(|<G,cos/>|)  time.  The  longest  path  that  TSPdecide  will  consider  contains  IV'I  +  1 
vertices.  (It  may  also  consider  some  shorter  paths  if  they  fail  before  completing  a  circuit). 
So  TSPdecide  goes  through  the  step  2  loop  at  most  0(|<G,  cost  >  | )  times  Each  step  of 
that  loop  takes  G(|<G,  coat>|)  time.  So  timereq(TSPdecide)  e  0(|<G,  corr>|2). 

We  ve  now  described  both  a  nondeterministic  decider  and  a  deterministic  verifier 
forTSP-DEClDE.  So  proving  the  next  theorem  is  straightforward. 


THEOREM  28.10  TSP-DECIDE  is  in  NP 


Theorem.  TSP-DECIDE  {<G,  cost>  :  <G>  encodes  an  undirected  graph  with 
a  positive  distance  attached  to  each  of  its  edges  and  G  contains  a  Hamiltonian 
circuit  whose  total  cost  is  less  than  cosr>}  is  in  NP. 

Proof:  The  nondeterministic  Thring  machine  TSPdecide  decides  TSP-DECIDE  in 
polynomial  lime. 
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While  it  is  sometimes  instructive  to  describe  a  decider  or  a  verifier  in  detail,  as  a 
TUring  machine,  as  we  have  just  done,  we  will  generally  describe  them  simply  as  well- 
specified  algorithms.  We  will  do  that  in  the  following  examples. 

28.2.4  Clique  Detection 

Recall  that,  given  a  graph  C  with  vertices  V  and  edges  £.  a  clique  in  G  is  a  subset  of  V 
with  the  property  that  every  pair  of  vertices  in  the  clique  is  connected  by  some  edge  in 
£.  A  k- clique  is  a  clique  that  contains  exactly  k  vertices 


Clique  detection,  particularly  the  detection  of  maximally  large  cliques,  plays 
an  important  role  in  many  applications  in  computational  biology. 


THEOREM  28.11  CLIQUE  is  in  NP 

Theorem:  CLIQUE  =  {<G,  k>  :G  is  an  undirected  graph  with  vertices  V  and 
edges  £,  k  is  an  integer,  I  sits  |V|,  and  G  contains  a  A' -clique}  is  in  NP. 

Proof:  We  can  prove  this  claim  by  describing  a  deterministic  polynomial  time 
verifier,  clique-verify[<G ,  k ,  c>),  that  takes  three  inputs,  a  graph  G,  an  integer 
k.  and  a  set  of  vertices  c,  where  c  is  a  proposed  certificate  for  <G.  k  >.  The  job  of 
dique-verify  is  to  check  that  c  is  a  clique  in  G  and  that  it  contains  k  vertices. The 
first  step  of  dique-verify  is  to  count  the  number  of  vertices  in  c.  If  the  number  is 
greater  than  |  V  |  or  not  equal  to  A,  it  will  immediately  reject.  Otherwise,  it  will  go 
on  to  step  2,  where  it  will  consider  all  pairs  of  vertices  in  c.  For  each,  it  will  go 
through  the  edges  in  £  and  check  that  there  is  an  edge  between  the  two  vertices 
of  the  pair.  If  there  is  any  pair  that  is  not  connected  by  an  edge,  dique-verify  will 
reject.  If  all  pairs  are  connected,  it  will  accept.  Step  1  takes  time  that  is  linear  in 
|c|,  which  is  bounded  by  some  polynomial  function  of  |<G.  A>|.  Step  2  must 
consider  |c|2  vertex  pairs.  For  each  it  must  examine  at  most  |£|  edges.  Since  both 
|c|  and  |£|  are  bounded  by  \<G, k>\, timereqldiquc-verify) eO(\<G% k>\3). 
So  dique-verify  is  a  deterministic  polynomial-lime  verifier  for  CLIQUE. 


28.2.5  Boolean  Satisfiability 

In  Section  22.4.1,  we  showed  that  several  key  questions  concerning  Boolean  formulas 
are  decidable.  In  particular,  we  showed  that  SAT  is  in  D.  We  can  now  consider  the 
complexity  of  SAT.  We’ll  prove  here  that  it  and  one  of  its  cousins.  3-SAT.  are  in  NP. 
You  may  recall  that,  in  Section  22.4.1,  we  also  showed  that  the  problem  of  deciding 
whether  a  Boolean  formula  is  valid  (i.e.,  whether  it  is  true  for  all  assignments  of  values 
to  its  variables)  is  decidable.  It  turns  out  that  that  problem  appears  to  be  harder  than  the 
problem  of  deciding  satisfiability.  We’ll  consider  the  language  VALID  =  {<u»  :  w  is 
a  wff  in  Boolean  logic  and  u?  is  valid }  in  Section  2X.X. 
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SAT  has  applications  in  such  domains  as  computer-aided  design,  computer- 
aided  manufacturing,  robotics,  machine  vision,  scheduling,  and  hardware  and 
software  verification.  It  is  particularly  useful  in  verifying  the  correctness  of 
digital  circuits  using  a  technique  called  model  checking.  (H.1.2) 


THEOREM  28.12  SAT  is  in  NP 

Theorem:  SAT  =  {  < W>  :  w  is  a  wff  in  Boolean  logic  and  w  is  satisfiable}  is  in  NP. 

Proof:  SAT  is  in  NP  because  there  exists  a  deterministic  polynomial  time  verifier 
for  it.  SAT -veri fy(<w,  c>)  takes  two  inputs,  a  wff  w  and  a  certificate  c,  which  is 
a  list  of  assignments  of  truth  values  to  the  variables  of  to.The  job  of  SAT-verify  is 
to  determine  whether  w  evaluates  to  True  given  the  assignments  provided  by  c. 
For  example: 

•  The  wff  w  =  P  A  Q  A  ->R  is  satisfiable.  The  string  c  =(P  =  True,  Q  =  True, 

R  =  False)  is  a  certificate  for  it,  since  the  expression  True  A  True  A  -> False 
simplifies  to  True.  SAT-verify(<w,  c>)  will  accept. 

•  The  wfftu  =  PAQA/?is  satisfiable.  But  the  string  c  =  (P  =  True, Q  = 
T rue,  R  =  False )  is  not  a  certificate  for  it,  since  the  expression  True  A 
True  A  False  simplifies  to  False.  So  SAT -veri fy(<w,  c> )  will  reject. 

•  The  wff  w  =  P  A  -iP  is  not  satisfiable.  So  for  any  c,  SAT-verify(<w,  c>) 
will  reject. 

Let  vars  be  the  number  of  distinct  variables  in  w.  Let  ops  be  the  number  of 
operators  in  uj.Then  SAT-verify  behaves  as  follows:  For  each  assignment  in  c,  it 
makes  one  pass  through  w,  replacing  all  occurrences  of  the  current  variable  with 
the  value  that  c  assigns  to  it. Then  it  makes  at  most  ops  passes  through  w,  on  each 
pass  replacing  every  operator  whose  arguments  have  already  been  evaluated  by 
the  result  of  applying  the  operator  to  its  arguments. Then  it  checks  to  see  whether 
xo  —  T rue.  The  first  step  must  consider  vars  variables  and  each  can  be  processed 
in  C?(M)  time.  Since  vars  s  |w|,  this  first  step  takes  C?(M2)  time. The  second 
step  executes  at  most  ops  passes  and  each  pass  can  be  done  in  O(M)  time.  Since 
ops  —  |m|,  the  second  step  takes  0(|u|2)  time.  Thus  SAT-verify  takes  time 
0(M2)  and  is  a  deterministic  polynomial-time  verifier  for  SAT. 

Alternatively,  we  can  build  a  nondeterministic  polynomial-time  decider  for 
SAT.  It  decides  whether  a  string  <w>  is  in  SAT  by  nondeterministically  choos¬ 
ing  a  set  of  assignments  to  the  variables  in  w.  Then  it  uses  SAT-verify  to  check 
whether  that  assignment  proves  that  w  is  satisfiable. 


As  far  as  we  know,  SAT  is  not  also  in  P.  No  polynomial  time  algorithm  to  decide  it  in 
the  general  case  is  known.  But  very  efficient  SAT  solvers  work  well  in  practice.  They 
take  advantage  of  the  fact  that  it  is  typically  not  necessary  to  enumerate  all  possible 
assignments  of  values  to  the  variables.  One  technique  that  exploits  this  observation 
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Table  28.1  3-CNF  and  CNF  formulas. 
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relies  on  a  clever  data  structure,  the  ordered  binary  decision  diagram  (or  OBDD), 
which  we  describe  in  B.1.3. 

We  next  describe  3-SAT,  a  variant  of  SAT  that  we  will  find  useful  in  our  upcoming 
discussion  of  the  complexity  of  several  other  languages.  Before  we  can  define  3-SAT 
we  must  define  conjunctive  normal  form  for  Boolean  formulas: 

•  A  literal  is  either  a  variable  or  a  variable  preceded  by  a  single  negation  symbol. 

•  A  clause  is  either  a  single  literal  or  the  disjunction  of  two  or  more  literals. 

•  A  well-formed  formula  (or  wff)  of  Boolean  logic  is  in  conjunctive  normal  form  (or 
CNF)  iff  it  is  either  a  single  clause  or  the  conjunction  of  two  or  more  clauses. 

•  A  wff  is  in  3-conjunctive  normal  form  (or  3-CNF)  iff  it  is  in  conjunctive  normal 
form  and  each  clause  contains  exactly  three  literals. 

Table  28.1  illustrates  these  definitions.  The  symbol  •  indicates  that  the  correspon¬ 
ding  formula  is  in  the  matching  form. 

Every  wff  can  be  converted  to  an  equivalent  wff  in  conjunctive  normal  form.  See 
B.1.I  for  a  proof  of  this  claim,  as  well  as  more  examples  of  all  of  the  terms  that  we  have 
just  defined. 


THEOREM  28.13  3-SAT  is  in  NP 

Theorem:  3-SAT  =  {<w>  :  w  is  a  wff  in  Boolean  logic,  w  is  in  3-conjunctive 
normal  form  and  w  is  satisfiable}  is  in  NP. 

Proof:  3-SAT  is  in  NP  because  there  exists  a  deterministic  polynomial  time  verifier 
for  it.  3-SAT-verif  y(<w,  c>)  first  checks  to  make  sure  that  to  is  in  3-CNF.  It  can 
do  that  in  linear  time.  Then  it  calls  S AT-verif y(<u\  c>)  to  check  that  c  is  a 
certificate  for  w. 


28.3  Does  P  =  NP? 

While  we  know  some  things  about  the  relationship  between  P  and  NP,  a  complete 
answer  to  the  question,  "Are  they  equal?”  has.  so  far.  remained  elusive. 

Wc  begin  by  describing  what  we  do  know: 
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THEOREM  28.14  Every  Language  in  P  is  also  in  NP _ 

Theorem:  PCNP. 

Proof:  Let  L  be  an  arbitrary  language  in  P.  Then  there  exists  a  deterministic 
polynomial  time  decider  M  for  L.  But  M  is  also  a  nondeterministic  polynomial 
time  decider  for  L.  (It  just  doesn’t  have  to  make  any  guesses.)  So  L  is  in  NP. 


So  all  of  the  following  languages  are  in  NP: 

•  every  context-free  language, 

•  EULERIAN-CIRCUIT, 

•  M  ST.  and 

•  PRIMES. 

But  what  about  the  other  direction?  Are  there  languages  that  are  in  NP  but  that  are 
not  in  P?  Alternatively  (since  we  just  showed  that  PQNP),  does  P  =  NP?  No  one 
knows.  There  are  languages,  like  TSP-DECIDE,  CLIQUE,  and  SAT  that  are  known  to 
be  in  NP  but  for  which  no  deterministic  polynomial  time  decision  procedure  exists.  But 
no  one  has  succeeded  in  proving  that  those  languages,  or  many  others  that  are  in  NP, 
are  not  also  in  P. 

The  question,  “Does  P  =  NP?”  is  one  of  seven  Millennium  Problems  H;  a  $1,000,000 
prize  awaits  anyone  who  can  solve  it.  By  the  way,  most  informed  bets  are  on  the  answer 
to  the  question  being, “No.”  Further,  it  is  widely  believed  that  even  if  it  should  turn  out  to 
be  possible  to  prove  that  every  language  that  is  in  NP  is  also  in  P,  it  is  exceedingly 
unlikely  that  that  proof  will  lead  to  the  development  of  practical  polynomial  time 
algorithms  to  decide  languages  like  TSP-DECIDE  and  SAT.  There  is  widespread 
consensus  that  if  such  algorithms  existed  they  would  have  been  discovered  by  now 
given  the  huge  amount  of  effort  that  has  been  spent  looking  for  them. 

While  we  do  not  know  with  certainty  whether  P  =  NP,  we  do  know  something 
about  how  the  two  classes  relate  to  other  complexity  classes.  In  particular,  define: 

•  PS  PACE:  For  any  language  L,  L  e  PSP  ACE  iff  there  exists  some  deterministic 
Turing  machine  M  that  decides  L  and  spacereq(M)  e  0(nk)  for  some  k. 

•  N PS  PACE:  For  any  language  L,  Le  NPSPACE  iff  there  exists  some  nondeter¬ 
ministic  Turing  machine  M  that  decides  L  and  spacereq(M)eO(nk)  for  some  k. 

•  EXPT1ME:  For  any  language  L,Le  EXPT1ME  iff  there  exists  some  deterministic 
TUring  machine  M  that  decides  L  and  timereq{M)eO{2{nl,))  for  some  k.  We’ll 
consider  the  class  EXPT1ME  in  Section  28.9. 

Chapter  29  is  devoted  to  a  discussion  of  space  complexity  classes,  including 
PSPACE  and  NPSPACE.  We  II  mention  here  just  one  important  result:  Savitch’s  Theo¬ 
rem  (which  we  state  as  Theorem  29.2)  tells  us  that  any  nondeterministic  TUring  ma¬ 
chine  can  be  converted  to  a  deterministic  one  that  uses  at  most  quadratically  more 
space.  So,  in  particular,  PSPACE  =  NPSPACE  and  we  can  simplify  our  discussion  by 
considcringjust  PSPACE. 
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We  can  summarize  what  is  known  about  P,  NP.  and  the  new  classes  PSPACE  and 
EXPTIME  as  follows: 

•  P  C  NP  £  PSPACE  C  EXPTIME. 

In  addition,  in  Section  28.9.1.  we  will  prove  the  Deterministic  Time  Hierarchy  Theo¬ 
rem,  which  tell  us  that  P  #  EXPTIME.  So  at  least  one  of  the  inclusions  shown  above 
must  be  proper.  It  is  generally  assumed  that  all  of  them  are.  but  no  proofs  of  those 
claims  exist. 

Because  we  know  that  P  *  EXPTIME,  we  also  know  that  there  exist  decidable  but 
intractable  problems. 


28.4  Using  Reduction  in  Complexity  Proofs 

In  Chapter  21,  we  used  reduction  to  prove  decidability  properties  of  new  languages  by 
reducing  other  languages  to  them.  Since  all  we  cared  about  then  was  decidability,  we 
accepted  as  reductions  any  Turing  machines  that  implemented  computable  functions. 
We  were  not  concerned  with  the  efficiency  of  those  Turing  machines.  We  can  also  use 
reduction  to  prove  complexity  properties  of  new  languages  based  on  the  known  com¬ 
plexity  properties  of  other  languages.  When  we  do  that,  though,  we  will  need  to  place 
bounds  on  the  complexity  of  the  reductions  that  we  use.  In  particular,  it  is  important 
that  the  complexity  of  any  reduction  we  use  be  dominated  by  the  complexity  of  the 
language  we  are  reducing  to.  To  guarantee  that,  we  will  exploit  only  deterministic, 
polynomial-time  reductions. 

All  of  the  reductions  that  we  will  use  to  prove  complexity  results  will  be  mapping 
reductions.  Recall,  from  Section  21.2.2,  that  a  mapping  reduction  R  from  Lx  to  Lj  is 
a  Turing  machine  that  implements  some  computable  function  /  with  the  property 
that: 


Vjt(jc  e  L\  «-*  f(x)  e 

Now  suppose  that  R  is  a  mapping  reduction  from  L\  to  L2  and  that  there  exists  a 
Timing  machine  M  that  decides  L2.  Then  to  decide  whether  some  string  jr  is  in  Lx  we 
first  apply  R  to  x  and  then  invoke  M  to  decide  membership  in  1.2.  So  C(.t)  =  M(R(x)) 
will  decide  L\. 

Suppose  that  there  exists  a  deterministic,  polynomial-time  mapping  reduction  R 
from  L{  to  L2.  Then  we’ll  say  that  L,  is  deterministic,  polynomial-lime  reducible  to  L* 
which  we'll  write  as  Lt  <P  L2.  And.  whenever  such  an  R  exists,  we  note  that: 

•  L\  must  be  in  P  if  L2  is:  If  L2  is  in  P  then  there  exists  some  deterministic,  polynomial¬ 
time  Turing  machine  M  that  decides  it.  So  Af(/?(.v))  is  also  a  deterministic, 
polynomial-time  Turing  machine  and  it  decides  L\. 

•  L\  must  be  in  NP  if  L2  is:  If  L2  is  in  NP  then  there  exists  some  nondeterministic, 
polynomial-lime  Turing  machine  M  that  decides  it.  So  M(R(x ))  is  also  a  nondeter¬ 
ministic.  polynomial-time  Turing  machine  and  it  decides 
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Given  two  languages  L\  and  L2,  we  can  use  reduction  to: 

•  Prove  that  L\  is  in  P  or  in  NP  because  we  already  know  that  L2  is. 

•  Prove  that  L\  would  be  in  P  or  in  NP  if  we  could  somehow  show  that  Li  is.  When  we 
do  this  we  cluster  languages  of  similar  complexity  (even  if  we’re  not  yet  sure  what 
that  complexity  is). 

In  many  of  the  reductions  that  we  will  do,  we  will  map  objects  of  one  sort  to 
objects  that  appear  to  be  of  a  very  different  sort.  For  example,  the  First  reduction  that 
we  show  will  be  from  3-SAT  (a  language  of  Boolean  wffs)  to  the  graph  language 
INDEPENDENT-SET.  On  the  surface.  Boolean  formulas  and  graphs  seem  quite 
different.  So  how  should  the  reduction  proceed?  The  strategy  we’ll  typically  use  is  to 
exploit  gadgets.  A  gadget  is  a  structure  in  the  target  language  that  mimics  the  role  of 
a  corresponding  structure  in  the  source  language.  In  the  3-SAT  to  INDEPENDENT- 
SET  reduction,  strings  in  the  source  language  describe  formulas  that  contain  literals 
and  clauses.  Strings  in  the  target  language  describe  graphs  that  contain  vertices  and 
edges.  So  we  need  one  gadget  that  looks  like  a  graph  but  that  mimics  a  literal  and 
another  gadget  that  looks  like  a  graph  but  that  mimics  a  clause.  Very  simple  gadgets 
will  work  in  this  case.  In  some  others  that  we’ll  see  later,  more  clever  constructions  will 
be  required. 

Consider  the  language: 

•  INDEPENDENT-SET  =  {<G,  k>  :  G  is  an  undirected  graph  and  G  contains  an 
independent  set  of  at  least  k  vertices}. 

An  independent  set  is  a  set  of  vertices  no  two  of  which  are  adjacent  (i.e.,  connected 
by  a  single  edge).  So,  in  the  graph  shown  in  Figure  28.3,  the  circled  vertices  form  an 
independent  set. 


Consider  a  graph  in  which  the  edges  represent  conflicts  between  the  objects 
that  correspond  to  the  vertices.  For  example,  in  a  scheduling  program  the 
vertices  might  represent  tasks. Then  two  vertices  will  be  connected  by  an  edge 
if  their  corresponding  tasks  cannot  be  scheduled  at  the  same  time  because 
their  resource  requirements  conflict.  We  can  find  the  largest  number  of  tasks 
that  can  be  scheduled  at  the  same  time  by  finding  the  largest  independent  set 
in  the  task  graph. 


FIGURE  2fk3  An  independent  set. 
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Nolice,  by  the  way.  that  there  is  an  obvious  relationship  between  INDEPENDENT- 
SET  and  CLIQUE.  If  S  is  an  independent  set  in  some  graph  G  with  vertices  V  and  edges 
£,  then  S  is  also  a  clique  in  the  graph  G'  with  vertices  V  and  edges  £'.  where  £'  contains 
an  edge  between  each  pair  of  nodes  n  and  m  iff  V  does  not  contain  such  an  edge. There  is 
also  a  relationship  between  INDEPENDENT-SET  and  the  language  CHROMATIC- 
NUMBER.  which  we’ll  define  in  Section  28.7.6.  While  INDEPENDENT-SET  asks  for 
the  maximum  number  of  vertices  in  any  one  independent  set  in  G.  CHROMATIC- 
NUMBER  asks  how  many  nonintersecting  independent  sets  are  required  if  every  vertex 
in  G  is  to  be  in  one. 

THEOREM  28.15  3-SAT  is  Reducible  to  INDEPENDENT-SET 
Theorem:  3-SAT  <P  INDEPENDENT-SET 

Proof:  We  show  a  deterministic,  polynomial-time  reduction  R  from  3-SAT  to 
INDEPENDENT-SET.  R  must  map  from  a  Boolean  formula  to  a  graph.  Let /be 
a  Boolean  formula  in  3-conjunctive  normal  form.  Let  k  be  the  number  of  clauses 
in  /.  R  is  defined  as  follows: 

/?(</>)  = 

1.  Build  a  graph  G  by  doing  the  following: 

1.1.  Create  one  vertex  for  each  instance  of  each  literal  in  /. 

1.2.  Create  an  edge  between  each  pair  of  vertices  that  correspond  to 
literals  in  the  same  clause. 

13.  Create  an  edge  between  each  pair  of  vertices  that  correspond  to 
complementary  literals  (i.e..  two  literals  that  arc  the  negation  of 
each  other). 

2.  Return  <G,  k  >. 

For  example,  consider  the  formula  (£  V  v  W)  A  (-,£  v  S  V  T).  From  this 
formula,  R  will  build  the  graph  shown  in  Figure  28.4. 

So  each  literal  gadget  is  a  single  vertex  and  each  clause  gadget  is  a  set  of  three 
vertices  plus  the  edges  that  connect  them. 

R  runs  in  polynomial  time.  To  show  that  it  is  correct,  we  must  show  that  /e 
3-SAT  iff  /?(</>) e  INDEPENDENT-SET. 


FIGURE  28.4  Graph  gadgets 
represent  a  Boolean  formula. 
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We  first  show  that  </>e  3-SAT  —  /?(</>)  e  INDEPENDENT-SET.  If 
</>  e  3-SAT  then  there  exists  a  satisfying  assignment  A  of  values  to  the  vari¬ 
ables  in  /.  We  can  use  that  assignment  to  show  that  G,  the  graph  that  R  builds, 
contains  an  independent  set  S  of  size  at  least  (in  fact,  exactly)  k.  Build  S  as  follows: 
From  each  clause  gadget  choose  one  literal  that  is  made  positive  by  A.  (There 
must  be  one  since  A  is  a  satisfying  assignment.)  Add  the  vertex  corresponding 
to  that  literal  to  S.  S  will  contain  exactly  k  vertices.  And  it  is  an  independent  set 
because: 

•  No  two  vertices  come  from  the  same  clause.  So  step  1.2  could  not  have  created 
an  edge  between  them. 

•  No  two  vertices  correspond  to  complementary  literals.  So  step  1.3  could  not 
have  created  an  edge  between  them. 

Next  we  show  that  /?(</>)  e  INDEPENDENT-SET  —  </>  e  3-SAT.  If 
/?(</>)  e  IN  DEPEN  DENT-SET  then  the  graph  G  that  R  builds  contains  an 
independent  set  S  of  size  at  least  (again,  in  fact,  exactly)  k.  We  can  use  that  set  to 
show  that  there  exists  some  satisfying  assignment  A  for  /.  Notice  that  no  two  ver¬ 
tices  in  S  come  from  the  same  clause  gadget  (because,  if  they  did,  they  would  be 
connected  in  G).  Since  S  contains  at  least  k  vertices,  no  two  are  from  the  same 
clause,  and  /  contains  k  clauses,  S  must  contain  one  vertex  from  each  clause.  So 
build  A  as  follows:  Assign  the  value  True  to  each  literal  that  corresponds  to  a 
vertex  in  S.This  is  possible  because  no  two  vertices  in  S  correspond  to  comple¬ 
mentary  literals  (again  because,  if  they  did,  they  would  be  connected  in  G). 
Assign  arbitrary  values  to  all  other  literals.  Since  each  clause  will  contain  at  least 
one  literal  whose  value  is  True,  the  value  of /will  be  True. 


If  we  want  to  decide  3-SAT,  it  is  unlikely  that  we  would  choose  to  do  so  by  reducing 
it  to  INDEPENDENT-SET  and  then  deciding  INDEPENDENT-SET.  That  wouldn’t 
make  sense  since  none  of  our  techniques  for  deciding  INDEPENDENT-SET  run  any 
faster  than  some  obvious  methods  for  deciding  3-SAT.  But,  having  done  this  reduction, 
we’re  in  a  new  position  if  somehow  a  fast  technique  for  deciding  INDEPENDENT-SET 
were  to  be  discovered.  Then  we  would  instantly  also  have  a  fast  way  to  decide  3-SAT. 


.5  NP-Completeness  and  the  Cook-Levin  Theorem 

We  don’t  know  whether  P  =  NP.  Substantial  effort  has  been  expended  both  in  looking 
for  a  proof  that  the  two  classes  are  the  same  and  in  trying  to  find  a  counterexample 
(i.e.,  a  language  that  is  in  NP  but  not  in  P)  that  proves  that  they  are  not.  Neither  of 
those  efforts  has  succeeded.  But  what  has  emerged  from  that  work  is  a  class  of  NP  lan¬ 
guages  that  are  maximally  ‘hard  in  the  sense  that  if  any  one  of  them  should  turn  out 
to  be  in  P  then  every  NP  language  would  also  be  in  P  (and  thus  P  would  equal  NP).This 
class  of  maximally  hard  NP  languages  is  called  NP-complete. 
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28.5.1  NP-Complete  and  NP-Hard  Languages 

Consider  two  properties  that  a  language  L  might  possess: 

L  L  is  in  NP. 

2.  Every  language  in  NP  is  deterministic,  polynomial-time  reducible  to  L. 

Using  those  properties,  we  will  define: 

The  Class  ISP- hard: L  is  NP-hard  ill  it  possesses  property  2.  Any  NP-hard  language 
L  is  at  least  as  hard  as  every  language  in  NP  in  the  sense  that  if  L  should  turn  out  to 
be  in  P,  every  NP  language  must  also  be  in  P.  Languages  that  are  NP-hard  arc  gen¬ 
erally  viewed  as  being  intractable,  meaning  that  it  is  unlikely  that  any  efficient  (i.e.. 
deterministic,  polynomial-time)  decision  procedure  for  any  of  them  is  likely  to  exit. 

The  Class  ISP-complete:  L  is  NP-complete  iff  it  possesses  both  property  1  and 
property  2.  All  NP-complete  languages  can  be  viewed  as  being  equivalently  hard 
in  the  sense  that  all  of  them  can  be  decided  in  nondetcrministic.  polynomial  time 
and.  if  any  one  of  them  can  also  be  decided  in  deterministic  polynomial  time, 
then  all  of  them  can. 

Note  that  the  difference  between  the  classes  NP-hard  and  NP-complete  is  that  NP- 
hard  contains  some  languages  that  appear  to  be  harder  than  the  languages  in  NP  (in 
the  sense  that  no  nondetcrministic,  polynomial-lime  decider  for  them  is  known  to 
exist).  To  see  the  difference,  consider  two  families  of  languages  whose  definitions  are 
based  on  popular  games: 

The  languages  that  correspond  to  generalizations  of  many  one-person  games  (or 
puzzles)  are  NP-complete.  For  example,  consider  the  generalization  of  Sudoku  9 
(described  in  N.2.2)  to  an  n  x  n  grid  (where  n  is  a  perfect  square). Then  define  the 
following  language: 

•  SUDOKU  =  {<b>:b  is  a  configuration  of  an  n  x  n  grid  and  b  has  a  solution 
under  the  rules  of  Sudoku } . 

SUDOKU  is  in  NP  because  there  exists  a  straightforward  verifier  that  checks  a  pro¬ 
posed  solution.  It  has  also  been  shown  to  be  NP-complete. 


The  complexity  of  Sudoku  is  similar  to  that  of  other  interesting  puzzles. 
(N.2.2) 


On  the  other  hand,  the  languages  that  correspond  to  generalizations  of  manv 
two-person  games  are  NP-hard  hut  thought  not  to  be  in  NP.  For  example,  consider 
the  generalization  of  chess  to  an  n  X  n  board. Then  define  the  language: 

•  CHESS  =  {<b>:b  is  a  configuration  of  an  n  x  «  chess  board  and  there  is  a 
guaranteed  win  for  the  current  player). 
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The  complexity  of  the  language  CHESS  explains  the  fact  that  it  took  almost 
50  years  between  the  first  attempts  at  programming  computers  to  play  chess 
and  the  first  time  that  a  program  beat  a  reigning  chess  champion.  Some  other 
games,  like  Go,  are  still  dominated  by  human  players.  (N.2) 


In  Section  28.9.  we’ll  return  to  the  issue  of  problems  that  appear  not  to  be  in  NP.  For 
now  we  can  just  notice  that  the  reason  that  CHESS  appears  not  to  be  in  NP  is  that  it  is 
not  possible  to  verify  that  a  winning  move  sequence  exists  just  by  checking  a  single 
sequence.  It  appears  necessary  to  check  all  sequences  that  result  from  the  choices  that 
could  be  made  by  the  opposing  player.  This  can  be  done  in  exponential  time  using 
depth-first  search,  so  CHESS  is  an  element  of  the  complexity  class  EXPT1ME  (which 
we  ll  define  precisely  later). 

The  class  NP-complele  is  important.  Many  of  its  members  correspond  to  problems, 
like  the  traveling  salesman  problem,  that  have  substantial  practical  significance.  It  is  also 
one  of  the  reasons  that  it  appears  unlikely  that  P  =  NP.  A  deterministic,  polynomial¬ 
time  decider  for  any  member  of  NP-complete  would  prove  that  P  and  NP  are  the  same. 
Yet.  despite  substantial  effort  on  many  of  the  known  NP-completc  problems,  no  such 
decider  has  been  found. 

But  how  can  we  prove  that  a  language  L  is  NP-complete?  To  do  so  requires 
that  we  show  that  every  other  language  that  is  in  NP  is  deterministic,  polynomial- 
lime  reducible  to  it  We  can't  show  that  just  by  taking  some  list  of  known  NP  lan¬ 
guages  and  cranking  out  the  reductions.  There  is  an  infinite  number  of  NP 
languages. 

If  we  had  even  one  NP-complete  language  L\  then  we  could  show  that  a  new  NP  lan¬ 
guage  L  is  NP-complete  by  showing  that  L'  is  deterministic,  polynomial-lime  reducible 
to  it. Then  every  other  NP  language  could  be  reduced  first  to  L'  and  then  to  L.  But  how 
can  we  gel  that  process  started?  We  need  a  “first"  NP-complete  language. 


.5.2  The  Cook-Levin  Theorem  and  the  NP-Completeness  of  SAT 

Steven  Cook  and  Leonid  Levin  independently  solved  the  problem  of  finding  a  first  NP- 
completc  language  by  showing  that  SAT  is  NP-complete.  Their  proof  does  not  depend 
on  reducing  individual  NP  languages  to  SAT.  Instead  it  exploits  the  fact  that  a  language 
is  in  NP  precisely  in  case  there  exists  some  nondeterministic,  polynomial-time  Turing 
machine  M  that  decides  it.  Cook  and  Levin  showed  that  there  exists  a  polynomial-time 
algorithm  that, given  <M>,  maps  any  siring  w  to  a  Boolean  formula  that  describes  the 
sequence  of  steps  that  M  executes  on  input  w.  They  showed  further  that  this  reduction 
guarantees  that  the  formula  it  constructs  is  satisfiable  iff  M  ends  its  computation  by  ac¬ 
cepting  w.  So.  if  there  exists  a  deterministic,  polynomial -time  algorithm  that  decides 
SAl.then  any  NP  language  (decided  by  some  Turing  machine  M),  can  be  decided  in  de¬ 
terministic.  polynomial  time  by  running  the  reduction  (based  on  <Af>)  on  w  and  then 
running  the  SAT  decider  on  the  resulting  formula. 
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Because  SAT  is  NP-complete.  it  is  unlikely  that  a  polynomial-time  decider 
for  it  exists.  But  Boolean  satisfiability  is  a  sufficiently  important  practical 
problem  that  there  exists  an  entire  annual  conference  o  devoted  to  the  study 
of  both  theoretical  and  applied  research  in  this  area.  We'll  say  more  about 
the  development  of  efficient  SAT  solvers  in  B.1.3. 


It  is  interesting  to  note  that  the  NP-completeness  proof  that  we  arc  about  to  do  is 
not  the  first  time  that  we  have  exploited  a  reduction  that  works  because  an  arbitrary 
Turing  machine  computation  can  be  simulated  using  some  other  (superficially  quite 
different)  structure: 

•  We  sketched,  in  the  proof  of  Theorem  22.4.Turing's  argument  that  the  Entscheidung- 
sproblem  is  not  decidable. That  proof  makes  use  of  a  reduction  that  maps  <M>  to  a 
first-order  logic  sentence  that  is  provable  (given  a  particular  set  of  axioms)  iff  M  ever 
prints  O.The  reduction  exploits  a  construction  that  builds  a  formula  that  describes  the 
sequence  of  configurations  that  M  enters  as  it  computes. 

•  We  showed,  in  the  proof  of  Theorem  22. 1 ,  that  PCP,  t  he  Post  Correspondence  Prob¬ 
lem  language,  is  not  decidable.  We  did  that  by  defining  a  reduction  that  maps  an 
<M,  w>  pair  to  a  PCP  instance  in  such  a  way  that  the  computation  of  M  on  w  can 
be  simulated  by  the  process  of  forming  longer  and  longer  partial  solutions  to  the 
PCP  instance.  Then  we  showed  that  that  process  ends  with  a  complete  solution  to 
the  PCP  instance  iff  M  halts  and  accepts  w. 

•  We  argued,  in  the  proof  of  Theorem  22.2.  that  TILES,  which  corresponds  to  a  set  of 
tiling  problems,  is  not  even  semidecidable.  We  did  that  by  defining  a  reduction  from 
a  Turing  machine  to  a  set  of  tiles  in  such  a  way  that  each  new  row  of  any  tiling 
would  correspond  to  the  next  configuration  of  the  Turing  machine  as  it  performed 
its  computation.  So  there  exists  an  infinite  tiling  iff  the  Turing  machine  fails  to  halt. 

•  We  show,  in  the  proof  of  Theorem  J.l.  that  a  simple  security  model  is  undecidable. 
We  do  that  by  defining  a  reduction  from  an  arbitrary  Turing  machine  to  an  access 
control  matrix. Then  we  show  how  the  computation  of  the  Turing  machine  could  be 
simulated  by  a  sequence  of  operations  on  the  access  control  matrix  in  such  a  way 
that  the  properly  qf  leaks  iff  the  Turing  machine  halls. 

The  key  difference  between  those  proofs  and  the  one  we  are  about  to  show  for  the 
Cook-Levin  Theorem  is  that  the  reduction  we’ll  describe  here  works  with  a  description 
of  a  Turing  machine  M  that  is  known  to  halt  (along  all  computational  paths)  and  to  do 
so  in  polynomial  time. 

THEOREM  28.16  Cook-Levin  Theorem 

Theorem:  SAT  =  { <u»  :  w  is  a  wff  in  Boolean  logic  and  ir  is  satisfiable}  is  NP- 
complete. 

Proof:  By  Theorem  28.12,  SAT  is  in  NP.  So  it  remains  to  show  that  it  is  NP-hard 
(i.e.,  that  all  NP  languages  are  deterministic,  polynomial-lime  reducible  to  it). 
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We’ll  show  a  generic  reduction  R  from  any  NP  language  to  SAT.  To  use  R  as  a 
reduction  from  a  particular  language  L  eNP,  we  will  provide  it  with  M,  one  of 
the  nondeterministic.  polynomial-time  deciding  machines  that  must  exist  for  L. 
Then,  when  R  is  applied  to  a  particular  string  w,  it  will  construct  a  (large  but 
finite)  Boolean  formula  that  is  satisfiable  iff  at  least  one  of  M's  computational 
paths  accepts  w. 

To  see  how  R  works,  imagine  a  two-layer  table  that  describes  one  computational 
path  that  M  can  follow  on  some  particular  input  w.  An  example  of  such  a  table  is 
shown  in  Figure  28.5.  Imagine  that  the  second  tier,  shown  in  bold,  is  overlaid  on  top 
of  the  first  tier. 

Each  row  of  the  table  corresponds  to  one  configuration  of  M .The  first  row  cor¬ 
responds  to  M's  starting  configuration  (in  which  the  read/write  head  is  positioned 
on  top  of  the  blank  immediately  to  the  left  of  the  first  symbol  of  w)  and  each  suc¬ 
ceeding  row  corresponds  to  the  configuration  that  results  from  the  one  before  it. 
Some  row  will  correspond  to  M's  halting  configuration  and  we  won’t  care  what 
any  of  the  rows  after  that  one  look  like.  We  don't  know  how  many  steps  M  exe¬ 
cutes  before  it  halts  so  we  don't  know  exactly  how  many  rows  we  will  need.  But  we 
do  know  that  timereq(M)  is  some  polynomial  function  /(|w|)  that  is  an  upper 
bound  on  the  number  of  steps.  So  we’ll  just  let  the  table  have  /(M)  4-  1  rows. 

The  lower  tier  of  the  table  will  encode  the  contents  of  M's  tape.  The  upper  tier 
will  indicate  M's  current  state  and  the  position  of  M's  read/write  head.  So  all  the 
cells  of  the  upper  tier  will  be  empty  except  the  one  in  each  row  that  corresponds 
to  the  square  that  is  under  the  read/write  head.  Each  of  those  nonempty  cells  will 
contain  a  symbol  that  corresponds  to  M's  current  state.  So,  in  the  table  shown 
above,  the  upper  tier  is  empty  except  for  the  cells  that  contain  an  expression  of 
the  form  qj. 

In  each  configuration,  M's  tape  is  infinite  in  both  directions.  But  only  a  finite 
number  of  squares  can  be  visited  by  M  before  it  halts.  We  need  only  represent  the 
squares  that  contain  the  original  input  plus  the  others  that  might  be  visited.  It  is 
possible  that  M  spends  all  of  its  steps  moving  in  a  single  direction.  So,  after  f(\w\ ) 
steps,  the  read/write  head  might  be  /(|u>|)  squares  to  the  left  of  where  it  began. 
Or  it  might  be  f{\w\)  squares,  including  the  original  input  string,  to  the  right  of 
where  it  began.  In  order  to  allow  room  for  either  of  these  worst  cases,  we  will 
include,  in  each  tape  description,  the  /(M)  tape  squares  to  the  left  of  the  initial 
read/write  head  position  and  max(j(\w\),  |iu|)  tape  squares  to  the  right  of  it. 
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FIGURE  28.5  A  two-layer  table  that  describes  one  computational  path  of  M  on  w. 
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For  example,  suppose  that,  on  input  aab.  one  of  M's  paths  runs  for  5  steps  and 
halts  in  the  accepting  state  v.  If  /( 3)  =  6.  then  that  path  might  be  described  in  the 
table  shown  above  (where  qjc  means  that  the  lower  tier  contains  the  symbol  c 
and  the  upper  tier  contains  q„). 

To  make  it  easier  to  talk  about  this  table,  let  rows  =  /(|?r|)  +  1  be  the  num¬ 
ber  of  rows  it  contains  and  let  cols  =  /(|w’l)  +  wiur(/(|tr|),  |«’l)  +  1  be  the 
number  of  columns  it  contains.  Let  patileft  =  /(|w’|). 

The  job  of  the  reduction  R ,  with  respect  to  some  particular  Turing  machine  Af, 
is  to  map  a  string  w  to  a  Boolean  formula  that  describes  a  table  such  as  the  one 
above.  R  will  guarantee  that  the  formula  it  builds  is  satisfiable  iff  all  of  the  follow¬ 
ing  conditions  are  met: 

1.  The  formula  describes  a  legal  table  in  the  sense  that: 

1.1.  The  upper  tier  contains  exactly  one  state  marker  per  row. 

1.2.  The  lower  tier  contains  exactly  one  symbol  per  tape  square. 

2.  The  formula  describes  a  table  whose  first  row  represents  the  initial  configu¬ 
ration  of  M  on  input  w. 

3.  The  formula  describes  a  table  some  row  of  which  represents  an  accepting 
configuration  of  M  on  input  w  (i.e.,  the  upper  tier  contains  the  state  y). 

4.  The  formula  describes  a  table  that  simulates  a  computation  that  M  could 
actually  perform.  So  every  row,  except  the  first  and  any  that  come  after  the 
accepting  configuration,  represents  a  configuration  that,  given  the  transi¬ 
tion  relation  that  defines  M.  can  immediately  follow'  the  configuration  that 
is  described  by  the  preceding  row. 

Given  these  constraints,  checking  whether  the  formula  that  R  builds  is  satisfiable 
is  equivalent  to  checking  that  there  exists  some  computation  of  M  that  accepts  w. 

It  would  be  easy  to  write  a  first-order  logic  formula  that  satisfies  conditions 
1-4.  We  could  write  quantified  formulas  that  said  things  like, “In  every  row  there 
exists  a  square  that  contains  a  slate  symbol  and  every  other  square  in  that  row 
does  not  contain  a  state  symbol."  But  the  key  to  defining  R  is  to  realize  that  we 
can  also  write  a  Boolean  formula  that  says  those  same  things.  The  reason  we  can 
is  that  we  know  that  M  halts  and  wc  have  a  bound  on  the  number  of  steps  it  will 
execute  before  it  halts.  So  we  know  the  size  of  the  table  that  describes  its  compu¬ 
tation.  That  means  that,  instead  of  creating  variables  that  can  range  over  rows  or 
tape  squares,  we  can  simply  create  individual  variables  for  each  property  of  each 
cell  in  the  table.  (Notice,  by  the  way.  that  if  we  tried  to  take  the  approach  we're 
using  here  and  use  it  to  reduce  an  arbitrary,  i.e..  not  necessarily  NP.  language  to 
SAT,  it  wouldn't  work  because  we  would  then  have  no  such  bound  on  the  size  of 
the  table.  So.  for  example,  we  couldn't  try  this  with  the  halting  language  H.) 

Imagine  the  cells  in  the  computation  table  that  we  described  above  as  being 
labeled  with  a  row  number  /  and  a  column  number  j.  We'll  label  the  cell  in  the 
upper  left  corner  (1,1).  Let  T  be  M's  tape  alphabet  and  let  K  be  the  set  of  its  states. 
We  can  now  define  the  variables  that  R  will  use  in  mapping  a  particular  input  to: 

•  For  each  i  and  j  such  that  I  s  /  s  rows  and  1  -s.  j  <  cols  and  for  each  sym¬ 
bol  c  in  f,  create  the  variable  rape tjx.  When  the  variable  uipei  jjC  is  used  in  a 
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formula  that  describes  a  computational  table,  it  will  be  assigned  the  value 
True  if  cell(/,;)  contains  the  tape  symbol  c.  Otherwise  it  will  be  Pulse.  These 
variables  then  describe  the  lower  tier  of  the  computational  table. 

•  For  each  i  and  j  such  that  1  <  i  <  rows  and  1  <  y  <  cols  and  for  each  state 
q  in  K ,  create  the  variable  state ^  When  the  variable  state is  used  in  a 
formula  that  describes  a  computational  table,  it  will  be  assigned  the  value 
True  if  cell (/,/)  contains  the  state  symbol  q.  Otherwise  it  will  be  False. These 
variables  then  describe  the  upper  tier  of  the  computational  table. 

We’re  now  ready  to  describe  the  process  by  which  R  maps  a  string  w  to  a 
Boolean  formula  DescribeMonw,  which  will  be  composed  of  four  conjuncts,  each 
corresponding  to  one  of  the  four  conditions  we  listed  above.  In  order  to  be  able 
to  state  these  formulas  concisely,  we’ll  define  the  notations: 

A  tapejjjt  and  V  tapelJJc 

Jsisrows  I  SiSrows  ' 

The  first  represents  the  Boolean  AND  of  a  set  of  propositions  and  the  second 
represents  their  Boolean  OR. 

CONJUNCT  I:  The  first  conjunct  will  represent  the  constraint  that  the 
table  must  describe  a  single  computational  path.  Without  this  constraint  it 
would  be  possible,  if  M  is  nondeterministic,  to  satisfy  all  the  other  constraints 
and  yet  describe  a  table  that  jumbles  multiple  computational  paths  together 
(thus  telling  us  nothing  about  any  of  them). 

For  each  cell  (t',;'),we  need  to  say  that  the  variable  corresponding  to  some  tape 
symbol  c  is  True  and  all  of  the  ones  corresponding  to  other  tape  symbols  are 
False.  So,  for  a  given  (/,;),  let  Tlyj  say  that  cell (/,/)  contains  symbol  C\  and  not  any 
others  or  it  contains  symbol  c2  and  not  any  others  and  so  forth  up  to  symbol  qpf 

Tij  =  V  (tapeiJx  A  (  A  -*apei4a)). 
ccr  i  e  r 

s  +  c 

Then  let  Tapes  say  that  this  is  true  for  all  squares  in  the  table.  So: 

Tapes  =  A 

Isis  rows 

We  also  need  to  say  that  each  row  contains  exactly  one  state  symbol.  So  let  Q,j 
say  that  cell(/',y)  contains  exactly  one  state  symbol: 

Qij  s  V  (stateiJjq  A  (  A  -i  statejjp)). 

pt  AC 
P*q 

Then  let  Slates  say  that,  for  each  row,  there  is  exactly  one  column  for  which 
that  is  true: 


(  A  Tu). 


Stales 


So  we  have: 


k*i 


Conjunct  i  =*  Tapes  A  States. 
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CONJUNCT 2.* The  second  conjunct  will  represent  the  constraint  that  the  first 
row  of  the  table  must  correspond  to  M's  initial  configuration  when  started  on 
input  w.  Assume  that  M's  start  state  is  <y0.  We'll  first  describe  the  lower  tier  of  the 
table  (the  symbols  on  M's  tape).  The  first  padleft  +  1  squares  will  be  blank. 
Then  the  input  string  w  will  appear  and  then  all  remaining  squares  will  be  blank. 
Let  w{j]  be  the  /h  symbol  in  w.  Let: 


Blanks  =  (  A  tape i.y.Wank)  M  A  tape 

I  sj£pwHefr+\  /**//<•  ff+M*  2^/ siWi 


Initialw  ■  A  .  .  tap*\ju\,\< 

padleft  +2  S  padleft +M+I  y  Ul 


Now  we  describe  the  upper  tier  of  the  table.  We  need  to  say  that  M  is  in  state 
q{)  with  its  read/write  head  immediately  to  the  left  of  the  first  square  of  w.  Let: 


Initialq  s  slate \,pa,n<i,  +  up 


Then  we  have: 

Conjunct  =  Blanks  A  Initialw  A  Initialq. 


CONJUNCT 5;  The  third  conjunct  will  represent  the  constraint  that  M's  com¬ 
putation  must  halt  in  the  accepting  state  y.This  means  that  some  cell  in  the  upper 
tier  of  the  table  must  contain  the  state  symbol  y. 

Conjunct 3  =  V  V  state,  ,  r 

tS/Srotr  lS/Sfi 

CONJUNCT  4:  The  fourth  and  last  conjunct  will  represent  the  constraint  that 
successive  rows  of  the  table  must  correspond  to  successive  configurations  in  a 
possible  computation  of  ALTo  construct  this  conjunct,  our  reduction  R  must  have 
access  to  M's  transition  relation  A. 

The  key  to  the  construction  of  conjunct A  can  be  seen  by  looking  through  a  small 
window  at  the  large  computation  table  that  we  are  working  with.  An  example  of 
such  a  window  is  shown  in  Figure  28.6.  Notice  that  each  successive  configuration 
of  M  is  nearly  identical  to  the  previous  one.  The  only  tape  square  that  can  change 
its  value  is  the  one  under  the  read/write  head.  And  the  read/write  head  can  move 
only  one  square  to  the  left  or  one  square  to  the  right. 
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FIGURE  28.6  A  window  into  a  compulation  table. 
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Call  the  first  row  in  which  the  accepting  state  y  occurs  done.  Now  consider  all 
rows  from  2  until  done.  (We  don’t  care  what  happens  in  any  rows  after  the  one  in 
which  y  appears. They  are  just  in  the  table  because  we  had  to  make  sure  there  was 
enough  room  for  all  the  rows  that  matter.)  What  we  want  to  say  is  that,  compar¬ 
ing  row  i  to  row  i  —  1: 

•  All  the  tape  squares  that  aren’t  under  the  read/write  head  stayed  the  same. 
Let: 

Sanies  sV2s  /  <  done  (V/(Vc  (read/write  head  not  in  column  j  in 
row  t  -  1  -*  (tapeiJx<->tapei- j.yJ))). 

'•  The  tape  square  under  the  read/write  head  changed  in  some  way  that  is 
allowed  by  A: 

ChangedTape  ■  V2  s/<  done  (V/  (Vc  (read/write  head  in  column  j  in 
row  i  —  1  and  fflpe</iC— * 

3p  (state  stored  in  row  i  —  1  =  p,  and 

3 s  (character  in  column  j  in  row  i  —  \  —  s,  and 
3<7(((p,  s),(q,  c,  ( -*  |  *-)))  e  A))))). 

•  The  state  and  the  read/write  head  changed  in  some  way  that  is  allowed  by  A. 
There  are  two  possibilities:  Either  the  read/write  head  moved  one  square  to 
the  right  or  it  moved  one  square  to  the  left: 

ChangedState  And  Head  =  V2  s  /  <  done  (V/(Vq(sfafe,  jjq-+ 

moved-right  V  moved-left))),  where: 
moved-right  *  (3 p  (state  stored  in  row  i  —  1  =  p,  and 

zs  (character  in  column  j  —  1  in  row  i  —  \  —  s,  and 
3c(((p,s).(q.c,  —  ))eA)))). 
moved-left  *  (3p  (state  stored  in  row  i  —  1  =  p,  and 

3r  (character  in  column  j  +  1  in  row  i  —  1  =  s,  and 
3c(((p,  s),  ( q ,  c,  <  ))  e  A)))). 

This  last  conjunct  is  the  most  complex  of  the  four.  So  we  will  skip  the  step  in 
which  we  convert  the  quantified  formulas  we’ve  just  presented  to  equivalent 
Boolean  ones.  By  now  it  should  be  clear  that  since  we  are  quantifying  over  a  finite 
set  of  objects,  doing  that  is  straightforward,  although  tedious.  So  we  have: 

Conjunct 4  »  the  Boolean  equivalent  of: 

Sames  A  ChangedTape  A  ChangedState AndHead. 

The  final  formula  that  R  produces:  We  can  now  state  R.  On  input  w,  it  uses 
<M>  and  constructs  a  description  of  the  Boolean  formula: 

DescribeMonw  =  Conjunct j  A  Conjunct  A  Conjunct 3  A  Conjunct 4. 

DescribeMonw  will  have  a  satisfying  assignment  to  its  variables  iff  there  exists 
some  computational  path  along  which  M  accepts  w.  So,  for  any  NP  language 
L,  L  ^  SAT. 
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It  remains  to  show  that  K(w)  operates  in  polynomial  time.  The  number  of 
variables  in  DescribeMonw  can  be  computed  as  follows:  We  know  that  the  num¬ 
ber  of  steps  that  M  will  execute  on  input  w  is  bounded  by  some  polynomial 
function  /(|u>|).  So  the  number  of  cells  in  the  computational  table  is  0(/(M)2). 
Call  that  number  cellcount.  To  represent  the  bottom  tier  of  the  table  requires 
cellcount  •  W\  variables.  To  represent  the  top  tier  of  the  table  requires 
cellcount  •  |K|  variables.  Since  both  |P|  and  |/C|  are  independent  of  |w|,  the  num¬ 
ber  of  variables  is  then  0(/(M)2).  So  the  number  of  characters  required  to 
encode  each  instance  of  a  variable  when  it  occurs  in  a  literal  in  DescribeMonw  is 
(0(log/(|io|)2)),  which  is  polynomial  in  |w|. 

Constructing  each  of  the  conjuncts  that  form  DescribeMonw  is  straightfor¬ 
ward.  But  we  must  show  that  the  length  of  each  of  them  is  bounded  by  some 
polynomial  function  of  w: 

•  Conjunct  1:  Each  formula  T,  j  contains  |P|2  literals.  So  Tapes  contains 
cellcount  •  | Tp  literals.  Each  formula  (>,  ,  contains  |K|2  literals.  So  States 
contains  cellcount' cols •  |/C|2eC?(/(|«,|)3)  literals  and  Conjunct i  contains 
0(/(M)3)  literals. 

•  Conjunct  2:  We  require  cols  literals  to  describe  the  tape  contents  and  1  to  describe 
the  state  and  read/write  head.  So  Conjunct 2  contains  0(/(|u?|))  literals. 

•  Conjunct  3:  Conjunct 3  contains  0(f (|tt’| )2)  literals. 

•  Conjunct  4:  The  straightforward  way  to  convert  the  quantified  expressions  we 
have  provided  into  the  required  Boolean  formulas  nests  ANDs  and  ORs  to 
correspond  to  the  nested  universal  and  existential  quantifiers.  If  we  do  that, 
then  we  will  get  formulas  with  at  most  cellcount’  |K|2*  |P|2  literals.  Again, 
since  |K|  •  are  |Tl  are  independent  of  «?,  we  have  that  Conjunct 4  contains 
C?(/(M)2)  literals 

So  {DescribeMonw]  is  polynomial  in  |u'|  and  it  can  be  constructed  in  polyno¬ 
mial  time. 


28.6  Other  NP-Complete  Problems 

The  Cook-Levin  theorem  gives  us  our  first  NP-complete  language.  In  this  section  we’ll 
see  that  it  is  not  alone  Q. 

28.6.1  A  Sampling  of  NP-Complete  Languages 

We’ve  already  described  many  languages  that  can  be  shown  to  be  NP-complete.  In  fact 
every  NP  language  that  we  have  mentioned,  except  for  CHESS  and  those  that  we  have 
said  are  in  P.  is  provably  NP-complete.  So  all  of  the  following  languages  are  NP-complete: 

•  SAT  =  {<«;>  :  w  is  a  wff  in  Boolean  logic  and  i r  is  satisfiablc}. 

•  3-SAT  =  {<u»  :  w  is  a  wff  in  Boolean  logic,  w  is  in  3-conjunctive  normal  form 
and  iv is  satisfiablc}. 
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•  TSP -DECIDE  =  {<G,  cost>,  where  <G>  encodes  an  undirected  graph  with  a 
positive  distance  attached  to  each  of  its  edges  and  G  contains  a  Hamiltonian  circuit 
whose  total  cost  is  less  than  cost}. 

•  HAMILTONIAN-PATH  =  {<G>  :G  is  an  undirected  graph  and  G  contains  a 
Hamiltonian  path}. 

•  HAMILTONIAN-CIRCUIT  =  {<G>  :  G  is  an  undirected  graph  and  G  contains 
a  Hamiltonian  circuit}. 

•  CLIQUE  =  {<G,  k>  :  G  is  an  undirected  graph  with  vertices  V  and  edges  E,  k  is 
an  integer,  1  s  k  s  |K|,  and  G  contains  a /c-clique}. 

•  INDEPENDENT-SET  =  {<G,  k>  :  G  is  an  undirected  graph  and  G  contains  an 
independent  set  of  at  least  k  vertices}. 

•  SUBSET-SUM  =  {<S,  k>  :  S  is  a  multiset  (i.e„  duplicates  are  allowed)  of  inte¬ 
gers,  k  is  an  integer,  and  there  exists  some  subset  of  S  whose  elements  sum 
to  k). 

•  SET-PARTITION  =  {<S>  :  S  is  a  multiset  (i.e.,  duplicates  are  allowed)  of  ob¬ 
jects  each  of  which  has  an  associated  cost  and  there  exists  a  way  to  divide  S  into  two 
subsets,  A  and  S  -  A,  such  that  the  sum  of  the  costs  of  the  elements  in  A  equals  the 
sum  of  the  costs  of  the  elements  in  S  -  A}. 

•  KNAPSACK  =  {<S,  v,  c>  :  S  is  a  set  of  objects  each  of  which  has  an  associated 
cost  and  an  associated  value,  v  and  c  are  integers,  and  there  exists  some  way  of 
choosing  elements  of  S  (duplicates  allowed)  such  that  the  total  cost  of  the  chosen 
objects  is  at  most  c  and  their  total  value  is  at  least  v}. 

•  SUDOKU  =  {<b>  :  b  is  a  configuration  of  an  n  X  n  Sudoku  grid  and  b  has  a 
solution}. 

Examples  of  other  languages  that  are  also  NP-complete  include: 

•  SUBGRAPH-ISOMORPHISM  =  {<Gi,G2>:Gj  is  isomorphic  to  some  sub¬ 
graph  of  G2}.  Two  graphs  G  and  H  are  isomorphic  to  each  other  iff  there  exists  a 
way  to  rename  the  vertices  of  G  so  that  the  result  is  equal  to  H.  Another  way  to 
think  about  isomorphism  is  that  two  graphs  are  isomorphic  iff  their  drawings  are 
identical  except  for  the  labels  on  the  vertices. 


The  subgraph  isomorphism  problem  arises  naturally  in  many  domains.  For 
example,  consider  the  problem  of  matching  two  chemical  structures  to  see  if 
one  occurs  within  another. 


•  B IN-PACKING  {  <5,  c,  k>  :  S  is  a  set  of  objects  each  of  which  has  an  associated 
size  and  it  is  possible  to  divide  the  objects  so  that  they  fit  into  k  bins,  each  of  which 
has  size  c}. 
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The  bin  packing  problem  can  be  extended  to  two  and  three  dimensions  and 
it  remains  NP-complete. The  two-dimensional  problem  arises,  for  example,  in 
laying  out  a  newsletter  with  k  pages  and  a  set  of  stories  and  pictures  that 
need  to  be  placed  on  the  pages.  The  three-dimensional  problem  arises,  for 
example,  in  assigning  cargo  to  a  set  of  trucks  or  train  cars. 


•  SHORTEST-SUPERSTRING  =  {<5,  k>  :S  is  a  set  of  strings  and  there  exists 
some  superstring  T such  that  every  element  of  5  is  a  substring  of  T and  Thas  length 
less  than  or  equal  to  *}. 


The  shortest  superslring  problem  arises  naturally  during  DNA  sequencing. 
The  problem  there  is  to  find  the  most  likely  larger  molecule  from  which  a  set 
of  fragments  were  derived.  (K.5) 


•  BOUNDED-PCP  =  {<P,  k>  :  P  is  an  instance  of  the  Post  Correspondence  prob¬ 
lem  (as  described  in  Section  22.2)  that  has  a  solution  of  length  less  than  or  equal  to  k}. 


28.6.2  Proving  That  a  Language  is  NP-Complete 

To  prove  that  a  new  language  is  NP-complete,  we  will  exploit  the  following  theorem. 
Recall  that  when  we  write  L\  :£P  L2,  we  mean  that  L\  is  polynomial-time  mapping 
reducible  to  L2. 

THEOREM  28.17 


Theorem:  If  L,  is  NP-complete,  L,  ^P  L2,  and  L2  is  in  NP.  then  L2  is  also  NP- 
complete. 

Proof:  If  L|  is  NP-complete  then  every  other  NP  language  is  deterministic, 
polynomial-time  reducible  to  it.  So  let  L  be  any  NP  language  and  let  RL  be  the 
Turing  machine  that  reduces  L  to  L\.  If  L,  ^P  L2.  let  R2  be  the  Turing  machine  that 
implements  that  reduction. Then  L  can  be  deterministic,  polynomial-time  reduced 
to  L2  by  first  applying  RL  and  then  applying  R2.  Since  L2  is  in  NP  and  every  other 
language  in  NP  is  deterministic,  polynomial-time  reducible  to  it.it  is  NP-complete. 


Theorem  28.17  tells  us  that  we  can  use  reduction  from  any  known  NP-complete 
language  to  show  that  a  new  language  is  also  NP-complete.  At  this  point,  we  have  only 
one  such  language:  SAT.  So  we  will  begin  by  using  it.  Once  we  have  others,  we  can  use 
whichever  one  makes  the  required  reduction  easy.  In  fact,  the  first  thing  we  will  do  is 
to  show  that  3-SAT,  a  close  relative  of  SAT.  is  NP-complete. Then  we'll  have  3-SAT  as 
a  tool  to  use  in  our  other  reductions. 
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28.6.3  3-SAT 

In  Section  28.2.5  we  defined: 

•  3-SAT  =  {<u>>  :  w  is  a  wff  in  Boolean  logic,  w  is  in  3-conjunctive  normal  form, 
and  w  is  satisfiable}. 

3-SAT  is  a  somewhat  contrived  language.  It  is  significant  primarily  because  doing 
reductions  from  3-SAT  is  often  substantially  easier  than  doing  them  from  SAT. 
3-SAT’s  restricted  form  limits  the  number  of  conditions  that  must  be  considered,  as  we 
saw  in  the  reduction  we  did,  in  Theorem  28.15,  from  3-SAT  to  INDEPENDENT-SET. 


THEOREM  28.18  3-SAT  is  NP-Complete _ 

Theorem:  3-SAT  is  NP-complete. 

Proof:  We  showed,  in  Theorem  28.13,  that  3-SAT  is  in  NP.  So  all  that  remains  is  to 
show  that  it  is  NP-hard  (i.e.,  that  every  other  language  in  NP  is  deterministic, 
polynomial-time  reducible  to  it). 

We  could  show  that  3-SAT  is  NP-hard  if  we  could  show  a  polynomial-time 
reduction  from  SAT  to  it.  Define: 

R(w:  wff  of  Boolean  logic)  = 

1.  Use  conjunctiveBoolean  (as  defined  in  the  proof  of  Theorem  B.1)  to  con¬ 
struct  w',  where  w'  is  in  conjunctive  normal  form  and  w'  is  equivalent  to  w. 

2.  Use  3-conjunctiveBoolean  (as  defined  in  the  proof  of  Theorem  B.2)  to 
construct  w",  where  w"  is  in  3-conjunctive  normal  form  and  w"  is  satis¬ 
fiable  iff  w'  is. 

3.  Return  w". 

If  R  ran  in  polynomial  time,  it  would  be  the  reduction  that  we  need.  In  Exercise  28.4, 
we  show  that  step  two  does  run  in  polynomial  time.  Unfortunately,  step  one  does 
not.  The  length  of  w'  (and  thus  the  time  required  to  construct  it)  can  grow  expo¬ 
nentially  with  the  length  of  w.  There  are  two  approaches  that  we  could  take  to 
solving  this  problem: 

•  We  can  retain  the  idea  of  reducing  SAT  to  3-SAT.  We  observe  that,  for  R  to  be  a 
reduction  from  SAT  to  3-SAT,  it  is  not  necessary  that  w'  be  equivalent  to  w.  It  is 
sufficient  to  assure  that  w'  is  satisifiable  iff  w  is.  There  exists  a  polynomial-time 
algorithm  (described  in  [Hopcroft,  Motwani  and  Ullman  2001 j)  that  constructs, 
from  any  wff  w,  a  w'  that  meets  that  requirement.  If  we  replace  step  one  of  R 
with  that  algorithm,  R  is  a  polynomial-time  reduction  from  SAT  to  3-SAT,  so 
3-SAT  is  NP-hard. 

•  We  can  prove  that  3-SAT  is  NP-hard  directly,  using  a  variant  of  the  proof  we 
offered  for  the  Cook-Levin  Theorem.  It  is  possible  to  modify  the  reduction  R 
that  proves  the  Cook-Levin  Theorem  so  that  it  constructs  a  formula  in  con¬ 
junctive  normal  form.  R  will  still  run  in  polynomial  time.  We  leave  the  proof 
of  this  claim  as  Exercise  28.13.  Once  R  has  constructed  a  conjunctive  normal 
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form  formula  w,  we  can  use  3-conjunctive  Boolean  to  construct  «>'  where  w'  is 
in  3-conjunctive  normal  form  and  w'  is  satisfiable  iff  w  is.  This  composition  of 
3-conjunctive  Boolean  with  R  shows  that  any  NP  language  can  be  reduced  to 
3-SAT.  So  3-SAT  is  NP-hard. 


28.6.4  Independent-Set 

Recall  that,  given  a  graph  G,  an  independent  set  is  a  set  of  vertices  of  G.  no  two  of 
which  are  adjacent  (i.e.,  connected  by  a  single  edge).  Using  that  definition,  we  defined 
the  following  language,  which  we  can  now  show  is  NP-complete: 

•  INDEPENDENT-SET  =  {<G.  k>  :  G  is  an  undirected  graph  and  G  contains  an 
independent  set  of  at  least  k  vertices}. 


THEOREM  28.19  INDEPENDENT-SET  is  NP-Complete  _ 

Theorem:  INDEPENDENT-SET  is  NP-complete. 

Proof:  We  must  prove  that  INDEPENDENT-SET  is  in  NP  and  that  it  is  NP-hard 
(i.e.,  that  every  other  language  in  NP  is  deterministic,  polynomial-time  reducible 
to  it). 

INDEPENDENT-SET  is  in  NP:  We  describe  Ver.  a  deterministic,  polynomial¬ 
time  verifier  for  it:  Let  G  be  a  graph  with  vertices  V  and  edges  £.  Let  c  be  a  cer¬ 
tificate  for  <G,  k>:  c  will  be  a  list  of  vertices.  On  input  <G.  k,  c>,  Ver  checks 
that  the  number  of  vertices  in  c  is  at  least  k  and  no  more  than  |V|.  If  it  is  not,  it 
rejects.  Next  it  considers  each  vertex  in  c  one  at  a  time.  For  each  such  vertex  v, 
it  finds  all  edges  in  E  that  have  v  as  one  endpoint.  It  then  checks  that  the  other 
endpoint  of  each  of  those  edges  is  not  in  c.  Timereq(Ver)  e  0(|c|  •  |£|  •  |c|). 
Both  |c*|  and  |£|  are  polynomial  in  |<G,  k>\.  So  Ver  runs  in  polynomial  time. 

INDEPENDENT-SET  is  NP-hard  because  Theorem  28.15  tells  us  that  3-SAT 
£P  INDEPENDENT-SET. 


28.6.5  Vertex-Cover 

A  vertex  cover  C  of  a  graph  G  with  vertices  V  and  edges  £  is  a  subset  of  V  with  the 
property  that  every  edge  in  £  touches  at  least  one  of  the  vertices  in  C.  Obviously  V  is  a 
vertex  cover  of  G.  But  we  are  typically  interested  in  finding  a  smaller  one.  So  we  define 
the  following  language,  which  we  will  show  is  NP-complete: 

•  VERTEX-COVER  =  {<G,  k> :  G  is  an  undirected  graph  and  there  exists  a  ver¬ 
tex  cover  of  G  that  contains  at  most  k  vertices}. 


To  be  able  to  test  every  link  in  a  network,  it  suffices  to  place  monitors  at  a  set 
of  vertices  that  form  a  vertex  cover  of  the  network.  (1.2) 
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We  will  show  that  VERTEX-COVER  (also  called  NODE-COVER)  is  NP-complete 
by  reducing  3-SAT  to  it.  The  proof  will  provide  another  example  of  the  use  of  carefully 
constructed  gadgets  that  map  the  literals  and  clauses  that  occur  in  strings  in  3-SAT  to 
the  vertices  and  edges  described  by  strings  in  VERTEX-COVER.  Alternatively,  we 
could  prove  that  VERTEX-COVER  is  NP-complete  with  a  very  simple  reduction  from 
INDEPENDENT-SET  (since,  if  5  is  an  independent  set  in  some  graph  G  with  vertices 
V  and  edges  £,  then  V  -  5  is  a  vertex  cover  of  G).  We  leave  that  alternative  proof  as  an 
exercise. 

THEOREM  28.20  VERTEX-COVER  is  NP-Complete _ _ 

Theorem:  VERTEX-COVER  is  NP-complete. 

Proof:  We  must  prove  that  VERTEX-COVER  is  in  NP  and  that  it  is  NP-hard. 

VERTEX-COVER  is  in  NP:  We  describe  Vier,  a  deterministic,  polynomial¬ 
time  verifier  for  it:  Let  G  be  a  graph  with  vertices  V  and  edges  E.  Let  c  be  a  cer¬ 
tificate  for  <G,  k>\  c  will  be  a  list  of  vertices.  On  input  <G,  k,  c> ,  Ver  checks 
that  the  number  of  vertices  in  c  is  at  most  min(k,  |  V'l).  If  it  is  not,  it  rejects.  Next 
it  considers  each  vertex  in  c  one  at  a  time.  For  each  such  vertex  v,  it  finds  all  edges 
in  E  that  have  v  as  one  endpoint  and  it  marks  each  such  edge.  Finally,  it  makes 
one  pass  through  E  and  checks  whether  every  edge  is  marked.  If  all  of  them  are, 
it  accepts;  otherwise  it  rejects.  Timereq(Ver)  e  0(|c|  •  |E|).  Both  |c|  and  |£|  are 
polynomial  in  | <G,  k>\.  So  Ver  runs  in  polynomial  time. 

VERTEX-COVER  is  NP-hard:  We  prove  this  by  demonstrating  a  reduction  R 
that  shows  that: 

3-SAT  <P  VERTEX-COVER. 

R's  job  is  to  map  a  Boolean  formula  f  (in  3-conjunctive  normal  form)  to  a 
graph.  It  will  exploit  two  kinds  of  gadgets: 

•  A  variable  gadget:  For  each  variable  x  in  /,  R  will  build  a  simple  graph  with  two 
vertices  and  one  edge  between  them.  Label  one  of  the  vertices  x  and  the  other 
one  -a. 

•  A  clause  gadget:  For  each  clause  c  in  /,  R  will  build  a  graph  with  three  vertices, 
one  for  each  literal  in  c. There  will  be  an  edge  between  each  pair  of  vertices  in 
this  graph. 

The  variable  and  clause  gadgets  must  then  be  connected  to  correspond  to  the 
structure  of/.  R  will  build  an  edge  from  every  vertex  in  a  clause  gadget  to  the  ver¬ 
tex  of  the  variable  gadget  with  the  same  label. 

So,  for  example,  given  the  Boolean  formula  (P  V  -<Q  v  T)  A  (-,P  V  Q  V  5),  R 
will  build  the  graph  shown  in  Figure  28.7. 

Let  /be  a  Boolean  formula  with  c  clauses  and  v  variables.  Then  we  can  define 
R  as  follows: 

*(</>)  - 

I  1.  Build  a  graph  G  as  described  above. 
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variable 

gadgets 


clause 

gadgets 


2.  Let  k  —  v  +  2c. 

3.  Return  <G,  k>. 

R  runs  in  polynomial  time.  To  show  that  it  is  correct,  we  must  show  that  </>  e 
3-SAT  iff  /?(</>)  e  VERTEX-COVER. 

We  first  show  that  </>e3-SAT— ♦/?(</>  )e  VERTEX-COVER.  If 
</>  e  3-SAT.  then  there  exists  a  satisfying  assignment  A  of  values  to  the  vari¬ 
ables  in  /.  We  can  use  that  assignment  to  show  that  G,  the  graph  that  R  builds, 
contains  a  vertex  cover  C  of  size  at  most  (in  fact,  exactly)  k.  We  can  construct 
C  by  doing  the  following: 

1.  From  each  variable  gadget,  select  the  vertex  that  corresponds  to  the  literal  that 
is  true  in  A.  Add  each  of  those  vertices  to  C. 

2.  Since  A  is  a  satisfying  assignment,  there  must  exist  at  least  one  true  literal  in 
each  clause.  Pick  one  and  put  the  vertices  corresponding  to  the  other  two  into  C. 

C  contains  exactly  k  vertices.  And  it  is  a  cover  of  C  because: 

•  One  vertex  from  every  variable  gadget  is  in  C.  so  all  the  edges  that  are  internal 
to  the  variable  gadgets  are  covered. 

•  T\vo  vertices  from  every  clause  gadget  are  in  C.  so  all  the  edges  that  are  inter¬ 
nal  to  the  clause  gadgets  are  covered. 

•  All  the  edges  that  connect  variable  gadgets  to  clause  gadgets  are  covered  be¬ 
cause,  for  each  clause  gadget: 

•  Two  of  the  three  emerging  edges  are  covered  by  the  two  clause  gadget- 
vertices  in  C. 

•  The  other  one  must  be  connected  to  a  variable  gadget  vertex  that  corre¬ 
sponds  to  a  true  literal,  so  that  vertex  is  in  C. 

Next  we  show  that  /?(</>)  e  VERTEX-COVER  —*<f>e  3-SAT.  If 
/?(</>)  e  VERTEX-COVER,  then  the  graph  G  that  R  builds  contains  a  vertex 
cover  C  of  size  at  most  (again,  in  fact,  exactly)  k.  Notice  that  C  must: 

•  Contain  at  least  one  vertex  from  each  variable  gadget  in  order  to  cover  the 
internal  edge  in  the  variable  gadget. 

•  Contain  at  least  two  vertices  from  each  clause  gadget  in  order  to  cover  all 
three  internal  edges  in  the  clause  gadget. 
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FIGURE  28.8  A  clause  gadget 

Satisfying  those  two  requirements  uses  up  all  k  vertices,  so  the  vertices  we 
have  just  described  are  the  only  vertices  in  C.  We  can  use  C  to  show  that  there  ex¬ 
ists  some  satisfying  assignment  A  for  /.  Building  A  is  simple:  Assign  the  value 
True  to  each  literal  that  is  the  label  for  one  of  the  vertices  in  C  that  comes  from  a 
variable  gadget.  We  note  that  A  is  a  satisfying  assignment  for  /  iff  it  assigns  the 
value  True  to  at  least  one  literal  in  each  offs  clauses. 

To  see  why  it  is  certain  that  A  does  this,  consider  an  arbitrary  clause  gadget  in 
G,  as  shown  in  Figure  28.8.  Since  C  is  a  cover  for  G,  all  six  of  the  edges  that  con¬ 
nect  to  vertices  in  this  gadget  must  be  covered.  But  we  know  that  only  two  of  the 
vertices  in  the  gadget  are  in  C.  They  can  cover  the  three  internal  edges.  But  the 
three  edges  that  connect  to  the  variable  gadgets  must  also  be  covered.  Only  two 
can  be  covered  by  a  vertex  in  the  clause  gadget.  The  other  one  must  be  covered 
by  its  other  endpoint,  which  is  in  some  variable  gadget.  So  each  clause  is  connect¬ 
ed  to  some  literal  whose  corresponding  vertex  is  in  C.  We  made  each  such  literal 
True  in  A.  So  A  assigns  the  value  True  to  at  least  one  literal  in  each  clause. Thus  it 
is  a  satisfying  assignment  for  /. 


28.6.6  HAMILTONIAN-CIRCUIT  and  the  Traveling  Salesman  Problem 

We  started  our  discussion  of  complexity,  at  the  beginning  of  Chapter  27,  by  considering 
the  traveling  salesman  problem.  We  observed  then  that,  while  there  exists  an  obvious 
exponential  algorithm  for  solving  the  problem,  there  does  not  exist  an  obvious  polyno¬ 
mial  algorithm  for  solving  it  exactly.  While  it  remains  an  open  question  whether  any 
polynomial  algorithm  for  the  traveling  salesman  problem  does  in  fact  exist,  we  can  now 
prove  a  result  that  suggests  that  it  is  relatively  unlikely  that  one  does.  TSP-DECIDE  is 
NP-complete. 

We  have  already  shown  that  TSP-DECIDE  is  in  NP.  But  we  must  also  show  that 
it  is  NP-hard,  which  we  will  do  by  reducing  3-SAT  to  it.  It  turns  out  to  be  easier  to 
map  3-SAT  to  appropriate  graph  structures  if  the  graph  edges  are  directed.  So  we 
will  introduce  a  new  language: 

D1RECTED-HAM1LTONIAN-C1RCUIT  =  {<G>  :  G  is  a  directed  graph  and 

G  contains  a  Hamiltonian  circuit}. 

Then  we  will  prove  that: 

3-SAT  S5P  DIRECTED-HAMILTON1AN  CIRCUIT  =eP 
HAMILTONIAN-CIRCUIT  s=P TSP-DECIDE. 


664  Chapter  28  Time  Complexity  Classes 


THEOREM  28.21  DIRECTED-HAMILTONIAN-CIRCUIT  is  NP-Complete 

Theorem:  DIRECTED-HAMILTONIAN-CIRCUIT  is  NP-complete. 

Proof:  We  must  prove  that  DIRECTED-HAMILTONIAN-CIRCUIT  is  in  NP 
and  that  it  is  NP-hard. 

DIRECTED-HAMILTONIAN-CIRCUIT  is  in  NP:  We  describe  Ver ,  a  deter¬ 
ministic,  polynomial-time  verifier  for  it:  Let  G  be  a  graph  with  vertices  V  and  edges 
E.  Let  c  be  a  certificate  for  <G,k>:  c  will  be  a  list  of  vertices.  On  input  <G,  A,  c>, 
Ver  checks  that  the  number  of  vertices  in  c  is  |  V|  +  I.  If  it  is  not,  it  rejects.  It  also 
I  rejects  if  the  first  and  last  vertices  are  not  identical.  Next  it  considers  each  vertex  v  in 
c,  except  the  last,  one  at  a  lime.  It  marks  v  in  V  and  rejects  if  it  had  previously  been 
marked.  It  also  checks  that  the  required  edge  to  v  exists  and  rejects  if  it  does  not.  If 
it  finishes  without  rejecting,  it  accepts.  Timereq(Ver)  e  C?(|c|  •  (IV'I  +  |£|)).  All  of 
|c|.  IV'I.  and  |E|  are  polynomial  in  |<G.  A>|.  So  Ver  runs  in  polynomial  time. 

DIRECTED-HAMILTONIAN-CIRCUIT  is  NP-hard:  We  prove  this  by 
demonstrating  a  reduction  R  that  shows  that: 

3-SAT  <P  DIRECTED-HAMILTONIAN-CIRCUIT. 

R' s  job  is  to  map  a  Boolean  formula  Bf  (in  3-conjunctive  normal  form)  to  a 
graph.  R  will  exploit  two  kinds  of  gadgets,  one  to  correspond  to  the  variables  of 
Bf  and  the  other  to  correspond  to  the  clauses. 

We’ll  describe  the  variable  gadgets  first.  Let  n  be  the  number  of  variables  in 
the  Boolean  formula  Bf.  If  v  is  the  /lh  such  variable,  let  m  he  the  larger  of  the 
number  of  occurrences  of  v  or  of  ->v  in  R/.The  gadget  that  corresponds  to  v  will 
have  the  structure  shown  in  Figure  28.9.  We’ll  call  this  gadget  V). 

Now  imagine  a  Hamiltonian  path  (not  a  circuit)  through  V,.  It  must  enter 
from  the  left  at  a,  and  leave  it  on  the  right  at  /),.  There  are  only  two  ways  to  do 
that.  If  the  path  begins  by  going  down  to  a  /  vertex,  then  it  must  next  go  straight 
up  to  the  matching/ vertex,  then  crosswise  to  the  next  t  vertex,  up  to  the  match¬ 
ing  /vertex,  and  so  forth.  Similarly,  if  the  path  begins  by  going  up  to  an  / vertex,  it 
must  next  go  straight  down  to  the  matching  t  vertex,  then  crosswise  to  the  next  / 
vertex,  and  so  forth.  A  path  that  did  anything  else  would  not  be  Hamiltonian 
since  it  would  not  visit  all  the  vertices.  So  there  are  two  paths  through  Vf. 

•  The  one  that  begins  by  going  down  to  a  i  vertex.  We  will  use  this  one  to  corre¬ 
spond  to  assigning  to  the  variable  v  the  value  True 

•  The  one  that  begins  by  going  up  to  an  /vertex.  We  will  use  this  one  to  corre¬ 
spond  to  assigning  to  the  variable  v  the  value  False. 


FIGURE  28.9  A  variable  gadget  in  the  reduction  from  3-SAT  to  DIRECTED- 
HAMILTONIAN-CIRCUIT. 
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FIGURE  28.10  Stringing  the  variable  gadgets  together. 


R  will  build  the  variable  gadgets  Vj  through  V„  and  then  combine  them  into  a 
single  structure  V,  as  shown  in  Figure  28.10.  Suppose  that  H  is  a  Hamiltonian  cir¬ 
cuit  through  V.  Then  H  must  enter  each  of  the  variable  gadgets  exactly  once 
(through  its  a  vertex),  choose  one  of  the  two  paths  through  that  gadget  (thus  ef¬ 
fectively  choosing  to  make  the  corresponding  variable  either  True  or  False),  leave 
that  variable  gadget  (through  its  b  vertex),  and  then  enter  the  next  one. 

Next  we  must  describe  the  clause  gadgets.  The  gadget  that  corresponds  to  the 
/lh  clause  in  the  formula  Bf  will  have  the  structure  shown  in  Figure  28.11.  We’ll 
call  this  gadget  C,. 

Suppose  that  C,  is  part  of  a  graph  G  that  contains  some  Hamiltonian  cycle  H. 
H  must  enter  through  one  of  C,’s  in  vertices.  Further,  note  that  if  it  enters  in  col¬ 
umn  j  it  must  also  leave  through  column ;'.To  see  why  this  is  so.  we  consider  all  the 
paths  it  could  take.  From  in,j,  H  can: 

•  Go  straight  down  to  out,j  and  exit. 

•  Proceed  across  to  the  next  in  vertex  and  then  down  to  the  matching  out  one. 
From  there  it  can  go  to  the  next  out  vertex  (which  will  be  outjj)  and  exit.  It 
cannot  simply  exit  right  away  because,  if  it  does,  there  is  no  way  for  H  to  reach 
outj  j.  The  two  vertices  that  could  precede  it  are  already  in  H  and  neither  of 
them  went  to  it.  Without  outjj,  H  can’t  be  Hamiltonian. 

•  Proceed  across  to  the  next  in  vertex  and  then  the  next  one.  From  there  it  can 
go  down  to  the  matching  out  vertex,  then  across  to  the  next  and  then  to  the 
next  (which  will  be  outj  j )  and  then  exit.  It  cannot  exit  at  either  of  the  other  out 
vertices  since,  if  it  did,  there  would  again  be  no  way  for  H  to  reach  outjj. 


FIGURE  28.11  A  clause  gadget  in  the  reduction  from 
3-SAT  to  DIRECTED-HAMILTONIAN-C1RCUIT. 
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R's  final  job  is  to  connect  the  variable  gadgets  and  the  clause  gadgets  to  form 
a  single  graph  G  that  corresponds  to  the  initial  Boolean  formula  Bf.  Its  goal  is  to 
do  so  in  such  a  way  that  there  will  be  a  Hamiltonian  circuit  through  G  iff  Bf  issat- 
isfiable.The  idea  is  that,  if  such  a  circuit  exists,  it  will  primarily  correspond  to  a 
circuit  through  V,the  variable  gadget  graph.  As  Vhas  been  defined,  such  a  circuit 
exists.  In  fact,  several  exist  since  there  are  two  paths  through  each  of  the  individ¬ 
ual  variable  gadgets.  So  what  R  must  do  now  is  to  connect  V  to  the  clause  gadgets 
so  that  there  will  still  be  a  Hamiltonian  circuit  if  Bf  is  satisfiable.  If.  on  the  other 
hand,  Bf  is  not  satisfiable.  the  introduction  of  the  clause  gadgets  will  produce  a 
graph  through  which  no  Hamiltonian  circuit  exists.  What  R  is  going  to  do  is  to  use 
the  clause  gadgets  to  introduce  detours  through  V  so  that  this  is  true. 

In  each  clause  gadget  C,  think  of  the  first  column  as  corresponding  to  its  corre¬ 
sponding  clause’s  first  literal,  the  second  column  as  corresponding  to  the  second 
literal,  and  the  third  column  as  corresponding  to  the  third  literal.  R  will  create 
three  detours  from  V  into  C  and  back,  one  for  each  of  those  literals.  So  R  will  con¬ 
sider  each  of  Cs  three  columns  in  turn.  The  literal  that  corresponds  to  that  col¬ 
umn  is  either  some  variable  v  or  its  negation  -iy: 

•  Suppose  R  is  working  on  column  /  and  the  corresponding  literal  is  y.  Then  R 
will  go  to  the  gadget  for  v  and  choose  the  first  of  its  columns  whose  /  vertex  has 
not  yet  been  chosen.  (Remember  that  the  number  of  columns  in  y’s  gadget  is 
equal  to  the  larger  of  the  number  of  instances  of  v  or  of  -m.so  such  a  column 
will  always  be  able  to  be  chosen.)  Suppose  that  the  vertex  labeled  tVj  is  cho¬ 
sen.  R  will  create  a  detour  from  /,,y  to  C  and  then  back  into  V  to  whatever  ver¬ 
tex  t„i  previously  linked  to.  If  we  end  up  choosing  the  path  through  y’s  gadget 
that  corresponds  to  assigning  v  the  value  True ,  then  that  successor  vertex  is 
fVtj.  So,  when  working  on  column  /,  R  will  create  a  detour  by  adding  a  vertex 
from  tvj  to  inC  i  and  from  outCj  to  fv_r 

•  Suppose,  on  the  other  hand,  that  the  corresponding  literal  is  -ly.Then  R  will  go 
to  the  gadget  for  v  and  choose  the  first  of  its  columns  whose  /  vertex  has  not 
yet  been  chosen.  Suppose  that  the  vertex  labeled  f,. ,  is  chosen.  Just  as  above,  R 
will  create  a  detour  from  the  chosen  vertex  into  C  and  then  back.  But  this  time 
it  will  assume  that  we  will  end  up  choosing  the  path  through  y’s  gadget  that 
corresponds  to  assigning  v  the  value  False.  In  that  case,  the  successor  vertex  of 
fv.j  ‘s  lv.r  S°,  when  working  on  column  i,  R  will  create  a  detour  by  adding  a  ver¬ 
tex  from  fvi  to  inCj  and  from  outCJ  vertex  to  r#v. 

To  see  how  these  detours  work,  consider  the  simple  example  shown  in 
Figure  28.12.  We  show  the  gadget  for  the  variable  P  (which  we've  assumed 
needs  just  two  columns).  We  also  show  the  gadget  for  the  clause  {P  V  Q  V  S). 
When  R  considers  that  gadget’s  first  column,  it  goes  to  the  gadget  for  P  and 
finds  the  first  available  i  vertex.  Assume  it’s  the  first  one. Then  it  adds  to  G  the 
two  dashed  edges. 

Notice  what  effect  these  two  new  edges  have  on  our  ability  to  find  a  Hamiltonian 
circuit  through  G.  If  such  a  circuit  is  traversing  P’s  gadget  in  the  True  direction  (i.e„ 
it  starts  by  going  from  aP  down  to  //.,,)•  then  it  can  now  pass  through  all  the  vertices 
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FIGURE  28.12  Combining  the  variable  and  the 
clause  gadgets. 


of  C,  leave  C,  and  then  continue  through  the  rest  of  P’s  gadget.  It  on  the  other 
hand,  such  a  circuit  is  traversing  P’s  gadget  in  the  False  direction,  it  cannot.  It  can 
enter  C,  but  when  it  leaves  it  would  have  to  return  to  a  vertex  (JP  {)  that  it  has  al¬ 
ready  visited. 

Let  Bf  be  a  Boolean  formula.  We  can  define  a  reduction  R  from  3-SAT  to 
D1RECTED-HAMILTONIAN-CIRCUIT  as  follows: 

R{<Bf>)  = 

1.  Build  the  graph  G  as  described  above. 

2.  Return  <G>. 

R  runs  in  polynomial  time.  To  show  that  it  is  correct  we  must  show  that 
<Bf>  e  3-SAT  iff  P(<B/>)e  DIRECTED-HAMILTONIAN-CIRCUIT. 

We  first  show  that  <Bf>  e  3-SAT  — >  R(<Bf>)  e  DIRECTED-HAMIL¬ 
TONIAN-CIRCUIT.  If  <B/>  c  3-SAT,  then  there  exists  a  satisfying  assignment 
A  of  values  to  the  variables  in  Bf.  We  can  use  that  assignment  to  show  that  G,  the 
graph  that  R  builds,  contains  a  Hamiltonian  circuit.  We  can  construct  such  a  cir¬ 
cuit  H  as  follows:  Begin  by  letting  H  be  just  a  Hamiltonian  circuit  through  V.  We 
have  a  choice,  for  each  variable  gadget,  of  two  paths  through  it.  If  A  assigns  the 
variable  v  the  value  True,  then  choose  the  path  that  begins  by  going  to  the  first  t 
vertex  in  u’s  gadget.  If,  on  the  other  hand,  A  assigns  v  the  value  False,  then  choose 
the  path  that  begins  by  going  to  the  first/ vertex  in  v’s  gadget. 

But  now  we  must  add  to  H  the  vertices  in  all  the  clause  gadgets.  Since  A  is  a 
satisfying  assignment,  each  clause  c  must  contain  at  least  one  literal  to  which  A 
assigns  the  value  True.  Pick  one.  If  it  is  the  (unnegated)  variable  v,  look  at  u’s 
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FIGURE  28.13  How  the  detours  work. 

gadget. There  will  be  an  edge  from  one  of  u's  /  vertices  (call  it  f,,*)  into  c’s  clause 
gadget  at  some  in  vertex  incj,  and  then  back  out  again  (from  the  same  column)  to 
the  f  vertex  ( fvJc )  immediately  above  ir  k.  H  currently  includes  an  edge  from  f„ ± 
to  fuJl.  Remove  that  edge  and  insert,  in  its  place,  the  edge  from  tr  k  to  incJ.  Then 
add  the  edges  that  visit  the  other  two  vertices  in  the  top  row  ol  c’s  gadget,  fol¬ 
lowed  by  the  three  vertices  in  its  bottom  row.  Finally  add  the  edge  that  leaves  c’s 
gadget  at  outcj  and  returns  to  v  at 

To  see  how  this  works,  consider  the  simple  case  shown  in  Figure  28.13.  Assume 
that  Bf  contains  the  clause  c  =  ( ->P  V  P  V  ->S)  and  that  the  only  variables  in  Bf 
are  P  and  S. Then  the  graph  G  that  R  builds  will  contain  the  two  fragments  shown 
in  the  figure:  V,  the  variable  gadget  structure,  and  C,  the  gadget  for  c.  Notice  that 
there  are  three  paths  into  and  out  of  C,  one  corresponding  to  ->/*,  one  correspon¬ 
ding  to  P ,  and  one  corresponding  to  R.  (Ignore  the  distinction  between  solid  and 
dashed  lines  for  the  moment.) 

Suppose  that  P  is  assigned  the  value  True  by  A  and  that  P  is  the  True  literal 
that  we  pick  as  we  are  building  H.  Because  A  assigns  P  the  value  True,  H' s  path 
through  P' s  gadget  will  be  aP.  then  iP},  then  //*. t.  then  ir  2,  and  so  forth.  Initially 
H  contains  all  the  edges  in  V.  But  now  we  remove  from  it  the  edge  from  tP  2  to 
fp  2  and  replace  it  by  the  set  of  edges  shown  above  as  dashed  lines.  H  can  still  con¬ 
tinue  its  path  through  V.  But  now  it  also  detours  and  visits  every  vertex  in  C.  And 
it  visits  each  of  them  only  once  because  we  apply  this  operation  to  exactly  one  of 
(nf’v/’V  -*/?)’  s  True  literals. 
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Now  suppose  that,  for  some  clause  c,  the  True  literal  that  we  pick  is  -iv.Then  we 
do  almost  the  same  thing  except  that  now  there  will  be  an  edge  from  one  of  w’s  / 
vertices  (call  it  fvJl)  into  c’s  clause  gadget  and  then  back  out  again  to  the  t  vertex 
(call  it  /„,*)  immediately  below  fvJk.  H  currently  includes  an  edge  from  feJc  to  tvJk. 
Remove  that  edge  and  insert,  in  its  place,  the  edge  from  into  c’s  gadget.  Then, 
just  as  above,  add  the  edges  that  visit  the  other  two  vertices  in  the  top  row  of  c’s 
gadget,  followed  by  the  three  vertices  in  its  bottom  row.  Finally  add  the  edge  that 
leaves  c’s  gadget  and  returns  to  v  at  tok. 

H  is  a  Hamiltonian  circuit  through  the  graph  G  that  R  builds.  It  includes  every 
vertex  in  V  exactly  once.  It  contains  exactly  one  detour  into  each  clause  gadget 
and  that  detour  visits  all  six  of  the  vertices  in  that  gadget.  So  every  vertex  in  G  is 
contained  in  H  exactly  once. 

It  remains  to  show  that  R(<B/>)eDIRECTED-HAMILTONIAN- 
CIRCU1T  — *  <Bf>  e  3-SAT.  If  R(<Bf>)e  D1RECTED-HAMILTONIAN- 
C1RCU1T  then  the  graph  G  that  R  builds  contains  a  Hamiltonian  circuit  we 
can  call  H.  We  use  H  to  construct  A,  a  satisfying  assignment  of  values  to  the 
variables  of  Bf.  Building  A  is  simple:  Examine  each  variable  gadget  in  G.  If  H 
follows  the  True  path  through  the  gadget  corresponding  to  variable  v  (i.e.,  it 
begins  by  going  from  av  to  feJ),  then  assign  v  the  value  True.  If,  on  the  other 
hand,  H  follows  the  False  path  through  v  ‘s  gadget  (i.e,,  it  begins  by  going  from 
a„  to  then  assign  v  the  value  False.  Since  H  is  Hamiltonian,  it  goes 
through  each  clause  gadget  exactly  once.  And,  since  it  is  Hamiltonian,  one  of 
the  following  two  things  must  be  true,  for  each  clause  gadget,  given  the  way  G 
was  constructed: 

•  H  connects  to  the  clause  gadget  in  a  column  that  corresponds  to  a  positive  lit¬ 
eral  v  and  it  does  so  by  a  detour  from  a  True  path.  In  this  case,  A  assigns  v  the 
value  True  and  so  the  clause  is  satisfied. 

•  H  connects  to  the  clause  gadget  in  a  column  that  corresponds  to  a  negated  lit¬ 
eral  -w  and  it  does  so  by  a  detour  from  a  False  path.  In  this  case,  A  assigns  v  the 
value  False.  So  -<v  is  True  and  so  the  clause  is  satisfied. 

|  Since  each  of  its  clauses  is  satisfied,  Bf  is  also  satisfied. 

The  reduction  R  that  we  just  described,  from  3-SAT  to  DIRECTED- 
HAM1LTON1AN-C1RCUIT,  only  worked  because  the  edges  in  the  graph  that  R 
built  were  directed.  But  the  fundamental  question,  “Does  a  Hamiltonian  circuit 
exist?”  is  just  as  hard  to  answer  for  undirected  graphs.  We  prove  that  result  next, 
using  a  very  simple  reduction  from  DIRECTED-H  AMILTONIAN-CIRCUIT. 

THEOREM  28.22  HAMILTONIAN-CIRCUIT  js  NP-Complete 

Theorem.  HAMILTONIAN-CIRCUIT  =  {<G>  :  G  is  an  undirected  graph  and 
G  contains  a  Hamiltonian  circuit}  is  NP-complete. 

Proof:  We  must  prove  that  HAMILTONIAN-CIRCUIT  is  in  NP  and  that  it  is  NP- 
hard. 
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HAM1LTON1AN-CIRCUIT  is  in  NP:  Ver,  ihe  verifier  that  we  just  described 
in  the  proof  of  Theorem  28.21,  also  works  here.  It  will  simply  consider  undirected 
edges  instead  of  requiring  directed  ones. 

HAMILTONIAN-CIRCUIT  is  NP-hard:  We  prove  this  by  demonstrating  a 
reduction  R  that  shows  that: 

DIRECTED-HAMILTON1AN-CIRCUIT  <P  HAMILTONIAN-CIRCUIT. 

Given  a  directed  graph  G ,  R  will  build  an  undirected  graph  G'.  Each  of  Gs 
vertices  will  be  represented  in  G'  by  a  gadget  that  contains  three  vertices  con¬ 
nected  by  two  edges.  Further,  if  there  is  a  directed  edge  in  G  from  v  to  w,  then  G' 
will  contain  an  (undirected)  edge  from  the  last  of  the  vertices  in  v's  gadget  to  the 
first  of  the  vertices  in  w’s  gadget.  Figure  28.14  shows  a  simple  example. 

Let  G  be  a  directed  graph.  We  can  define  a  reduction  R  from  DIRECTED- 
HAMILTON1AN-C1RCU1T  to  HAMILTONIAN-CIRCUIT  as  follows: 

/?(<G>)  = 

1.  Build  the  graph  G'  as  described  above. 

2.  Return  <G’>. 

R  runs  in  polynomial  time.To  show  that  it  is  correct  we  must  show  that  <G>  e 
D1RECTED-HAMILTONIAN-CIRCUIT  iff  R(<G>)  e  HAMILTONIAN- 
CIRCUIT. 


Given  O’  (which  does  contain 

a  Hamiltonian  circuit): 


Given  G  (which  does  not  contain 
a  Hamiltonian  circuit): 


FIGURE  28.14  Reductions  from  DIRECTED-HAMILTONl  AN-CIRCUIT  to 
HAMILTONIAN-CIRCUIT. 
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We  first  show  that  <G>  e  D1RECTED-HAMILTONIAN-CIRCUIT— ► 
R(<G>)  g  HAM1LTONIAN-C1RCUIT.  G  must  contain  at  least  one  Hamilton¬ 
ian  circuit,  which  we  will  call  H.  Assume  that  H  =  (»j,  v2, . . . ,  vk,  Vj).  Then  we 
can  describe  H\  a  Hamiltonian  circuit  through  G\  It  starts  at  the  top  of  Vj’s 
gadget,  walks  down  through  it,  then  goes  to  the  top  of  v2's  gadget,  walks  down 
through  it.and  so  forth  until  it  has  visited  the  last  vertex  of  vk's  gadget.  It  ends  by 
returning  to  the  first  vertex  of  ufs  gadget.  In  other  words, 

H'  =  Vlfc,  Vlc,  Viai  VU,  «2o  •  •  • .  Vk„,  Vkh,  Vkc ,  Vlfl). 

It  remains  to  show  that  (7?(<G>)eHAM1LTONIAN-CIRCUIT)  — 
D1RECTED-HAMILTONIAN-CIRCUIT).  Notice  that,  in  any  graph  that  R 
builds,  each  b  vertex  is  attached  to  exactly  two  edges.  So  any  Hamiltonian  circuit 
through  such  a  vertex  comes  either  down  from  the  top,  or  up  from  the  bottom,  of 
the  corresponding  gadget.  Pick  a  gadget.  If  a  Hamiltonian  circuit  through  it  goes 
down  from  the  top,  then  it  must  continue  to  the  top  of  some  other  gadget.  So  it 
must  go  down  through  that  one  as  well.  And  it  must  continue  through  all  the 
gadgets,  in  each  case  going  down  from  the  top.  Alternatively,  it  can  move  bottom 
to  top  through  all  the  gadgets.  The  key  is  that  it  must  move  in  the  same  direction 
through  all  the  vertex  gadgets.  If  R(<G>)  g  HAMILTON1AN-CIRCUIT,  then 
the  graph  G'  that  R  builds  must  contain  at  least  one  Hamiltonian  circuit.  Pick  one 
and  call  it  H.  Assume  that  H  traverses  the  G'  gadgets  top  to  bottom.  (If  it  goes  in 
the  other  direction,  then,  since  G'  is  undirected,  there  is  another  Hamiltonian  cir¬ 
cuit  through  G'  that  is  identical  to  H  except  that  it  moves  in  the  other  direction. 
Choose  it  instead.)  Note  that  H  can  only  traverse  the  gadget  for  v  and  then  the 
gadget  for  w  in  case  there  was  a  directed  edge  from  v  to  w  in  the  original  graph 
G.  So,  suppose  H  visits  the  gadgets  for  the  vertices  (v,,u2.---i  vk,  v,),  in  that 
order.  Then  (t>j,  . vk, «,)  is  a  Hamiltonian  circuit  through  G. 


We  are  now  in  a  position  to  return  to  the  traveling  salesman  problem,  with  which  we 
began  the  previous  chapter. 


THEOREM  28.23  TSP-DECIDE  is  NP-complete 

Theorem:  TSP-DECIDE  =  {<G,cost> :  <G>  encodes  an  undirected  graph 
with  a  positive  distance  attached  to  each  of  its  edges  and  G  contains  a  Hamilton¬ 
ian  circuit  whose  total  cost  is  less  than  cost }  is  NP-complete. 

Proof:  We  have  already  shown  (in  Theorem  28.10)  that  TSP-DECIDE  is  in  NP.  It 
remains  to  prove  that  it  is  NP-hard,  which  we  do  with  a  straightforward  reduction 
R  that  shows  that: 

HAMILTONIAN-CIRCUIT  <P  TSP-DECIDE. 

Let  G  be  an  unweighted,  undirected  graph  with  vertices  V.  R  must  map  G  into 
a  weighted,  undirected  graph  plus  a  cost.  We  observe  that,  if  there  is  a  Hamiltonian 
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circuit  through  G.  it  must  contain  exactly  |V|  edges.  So  suppose  that  we  augment 
G  with  edge  costs  by  assigning  to  every  edge  a  cost  of  I .  Then,  if  there  is  a  Hamil¬ 
tonian  circuit  in  G.  its  total  cost  must  be  equal  to  I V  |.  Because  this  is  true,  we  can 
define  R  as  follows: 

R(<G>)  = 

1.  From  G  construct  G'.  a  weighted  graph.  G'  will  be  identical  to  G  except 
that  each  edge  will  be  assigned  the  cost  1. 

2.  Return  <G',  |V|>. 

R  runs  in  polynomial  time.  And  it  is  correct  since  G  has  a  Hamiltonian  circuit 
iff  G'  has  one  with  cost  equal  to  |V|. 


28.7  The  Relationship  between  P  and  NP-Complete 

So  far.  every  NP  language  that  we  have  considered  has  turned  out  also  cither  to  be  in  P 
or  to  be  NP-complete.  Is  it  necessarily  true  that  every  NP  language  has  that  property? 
The  answer  is  no.  In  fact,  unless  P  =  NP.  there  must  exist  languages  that  don’t. 

28.7.1  The  Gap  between  P  and  NP-Complete 

Call  the  class  of  NP-complete  languages  NPC.  Let  NPL  =  NP  -  (P  U  NPC).  In  other 
words.  NPL  is  the  limbo  area  between  P  and  NP-complete.  Trivially,  if  P  =  NP  then 
NPL  =  0.  But  what  if  (as  seems  more  likely)  P  *  NP?  We  can  prove  the  following 
theorem  that  tells  us  that,  in  that  case,  NPL  is  not  empty. 

THEOREM  28.24  Ladner's  Theorem 
Theorem:  If  P  *  NP,  then  NPL  *  0. 

Proof:  The  proof  relies  on  the  following  more  general  claim  that  is  proved  in 
[Ladner  1973): 

Claim:  Let  B  be  any  decidable  language  that  is  not  in  P.  There  exists  a 
language  D  that  is  in  P  and  that  has  the  following  property:  Let 
A  =  DC\B.  Then  A  e  P.  A  fl,  but  it  is  not  true  that  B  sp  A. 

Suppose  that  B  is  any  NP-complete  language.  Unless  P  =  NP.  B  is  not  in  P.  So 
there  must  exist  a  language  D  that  is  in  P.  and  from  which  we  can  compute 
A  =  DOB.  A  must  be  in  NP  since  membership  in  D  can  be  decided  in  polyno¬ 
mial  time  and  membership  in  B  can  be  verified  in  polynomial  time.  So  the  claim 
that  Ladner  proved  tells  us  that: 

•  A  e  P.  but 

•  it  is  not  true  that  <P  A.  Since  B  is  in  NP  but  is  not  deterministic,  polynomial¬ 
time  reducible  to  A,  A  is  not  NP-complete. 
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So  A  is  an  example  of  an  NP  language  that  is  neither  in  P  nor  NP-complete. Thus 
NPL  #  0. 


It  is  possible,  using  diagonalization  techniques,  to  construct  languages  that  are  in 
NPL.  But  it  remains  true  that  few  “natural”  languages  are  in  that  class.  A  comprehen¬ 
sive  catalogue  of  NP  problems  [Garey  and  Johnson  1979]  lists  three  candidates  for 
membership  in  NPL: 

•  COMPOSITES  =  {%o :  w  is  the  binary  encoding  of  a  composite  number}.  Recall 
that  a  composite  number  is  a  natural  number  greater  than  1  that  is  not  prime. 

•  LINEAR-PROGRAMMING,  which  we  will  describe  in  Section  28.7.7. 

•  GRAPH-ISOMORPHISM  =  {<Gi,G2>  :G!  is  isomorphic  to  G2}.  Recall  that 
two  graphs  G  and  H  are  isomorphic  to  each  other  iff  there  exists  a  way  to  rename 
the  vertices  of  G  so  that  the  result  is  equal  to  H. 

It  is  now  known  that  COMPOSITES  (see  Section  28.1.7)  and  LINEAR- 
PROGRAMMING  (see  Section  28.7.7)  are  in  P. 

The  jury  is  still  out  on  GRAPH-ISOMORPHISM.  It  is  easy  to  show  that  GRAPH- 
ISOMORPHISM  is  in  NP.  A  proposed  renaming  of  the  vertices  of  Gj  so  that  it  match¬ 
es  G2  is  a  certificate,  which  can  easily  be  checked  in  polynomial  time.  Recall  that  the 
subgraph  isomorphism  language,  SUBGRAPH-ISOMORPHISM,  which  asks  whether 
G|  is  isomorphic  to  some  subgraph  ofG2  is  NP-complete.  It  appears  that  the  graph  iso¬ 
morphism  problem  is  easier,  perhaps  because  we  must  compare  only  G\  and  G2,  not  Gj 
and  all  of  G2’s  subgraphs.  But  graph  isomorphism  has  not  been  shown  to  be  in  P,  nor 
has  it  been  shown  not  to  be  NP-hard  (and  thus  NP-complete). 

Problems  like  GRAPH-ISOMORPHISM  are  rare,  though.  So,  most  of  the  time, 
an  NP  problem  will  turn  out  either  to  be  NP-complete  or  to  be  in  P.  The  question 
then  is,  “Which?”  It  is  interesting  to  note  that  sometimes  what  appears  to  be  a 
slight  change  in  a  problem  definition  makes  the  difference  between  a  language  that 
is  in  P  and  one  that  is  NP-complete.  We’ll  next  consider  several  examples  of  this 
phenomenon. 


.7.2  Two  Similar  Circuit  Problems 

Consider  the  two  circuit  problems: 

•  EULER1AN-CIRCU1T,  in  which  we  check  that  there  is  a  circuit  that  visits  every 
edge  exactly  once. 

•  HAMILTONIAN-CIRCUIT,  in  which  we  check  that  there  is  a  circuit  that  visits 
every  vertex  exactly  once. 

We  have  already  seen  that  EULERIAN-CIRCUIT  is  in  P,  but  HAMILTONIAN- 
CIRCUIT  is  NP-complete. 
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28.7.3  Two  Similar  SAT  Problems 

Define  2-conjunctive  normal  form  (2-CNF)  analogously  to  .^-conjunctive  normal  form 
(3-CNF)  except  that  each  clause  must  contain  exactly  two  literals.  So.  for  example, 
(-, P  v  R)  A  (S  V  -i/')  is  in  2 -conjunctive  normal  form.  Now  consider: 

•  2-SAT  =  { <t/*>  :  ir  is  a  wff  in  Boolean  logic,  ir  is  in  2-eonjunctive  normal  form 
and  //’  is  salisfiable). 

•  3-SAT  =  {<ir>  :  ir  is  a  wff  in  Boolean  logic,  //•  is  in  3-eonjunctive  normal  form 
and  tv  is  salisfiahlc}. 

2-SAT  is  in  P  (which  we  prove  in  Exercise  28.5a).  But  3-SAT  is  NP-complete. 


28.7.4  Two  Similar  Path  Problems: 

Consider  the  problem  of  finding  the  shortest  path  with  no  repeated  edges  through 
an  unweighted  graph  O'.  We  can  convert  this  to  a  decision  problem  by  defining  the 
language: 

SHORTEST-PATH  =  {<G\  u.  v,k>:  G  is  an  unweighted,  undirected  graph.// 
and  //  arc  vertices  in  G.  k  s  0,  and  there  exists  a  path  from  //  to  />  whose  length 
is  at  most  k ). 

SHORTEST-PATH  is  in  P  because  the  following  simple  marking  algorithm  decides 
it  in  G(|<G>p)  time: 

shortesbpath(G:  graph  with  vertices  V  and  edges  E.  it:  vertex,  v:  vertex,  k:  integer)  = 

1.  Mark  //. 

2.  For/'  =  1  to min(k,  |£|)do: 

For  each  currently  marked  vertex  //  do: 

For  each  edge  from  n  to  some  other  vertex  ///  do: 

Mark  ///. 

3.  If  /’  is  marked  then  accept:  else  reject. 

We  should  note  here  that  the  simple  algorithm  shortest-fnith  works  beeause  we  are 
considering  only  unweighted  graphs.  So  it  suffices  simply  to  count  the  number  of  edges 
that  are  traversed.  If,  on  the  other  hand,  we  want  to  solve  the  analogous  problem  for 
weighted  graphs,  the  problem  is  more  difficult.  But  even  this  problem  can  also  be 
solved  efficiently,  for  example  by  using  Dijkstra's  algorithm  a. 


Finding  the  shortest  path  through  a  weighted  graph  is  important  in  manv 
applications.  The  obvious  ones  include  finding  routes  on  a  map  or  routing 
packets  through  a  network.  (1.2)  But  there  are  many  less  obvious  ones  as 
well,  particularly  if  we  allow  weighted  edges.  For  example,  consider  one 
problem  that  an  optical  character  recognition  (OCR)  system  must  solve: 
Find  the  boundaries  between  letters.  One  way  to  think  about  doing  this  is 
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that  the  goal  is  to  find  as  straight  as  possible  a  path  that  cuts  between  the 
regions  occupied  by  two  characters  and  that  touches  as  few  black  pixels  as 
possible.  To  solve  this  problem,  we  model  the  boundaries  between  pixels  as 
vertices  and  we  add  edges  that  cut  through  the  pixels  from  one  boundary  to 
another.  We  assign  a  weight  of  one  to  every  edge  that  cuts  through  a  white 
pixel  and  we  assign  a  very  large  weight  to  every  edge  that  cuts  through  a 
black  pixel,  Then  the  lowest-cost  path  between  two  regions  is  the  most 
direct  path  that  cuts  through  the  fewest  black  pixels. 


But  now  consider  the  problem  of  finding  the  longest  path  with  no  repeated  edges 
through  an  unweighted  graph  G.  We  can  convert  this  to  a  decision  problem  by  defining 
the  language: 

LONGEST-PATH  =  {<G,u,v,k>:G  is  an  unweighted,  undirected  graph, 
u  and  v  are  vertices  in  G,  k  >  0,  and  there  exists  a  path  with  no  repeated  edges 
from  u  to  v  whose  length  is  at  least  k). 

LONGEST-PATH  is  in  NP  (since  a  candidate  path  can  be  checked  in  polynomial 
time).  And  it  can  be  shown  to  be  NP-complete. 


.7.5  Two  Similar  Covering  Problems: 

Recall  that  a  vertex  cover  (also  called  a  node  cover)  C  of  a  graph  G  is  a  subset  of  the 
vertices  of  G  with  the  property  that  every  edge  of  G  touches  at  least  one  of  the  ver¬ 
tices  in  C.  Now  define  an  edge  cover  C  of  a  graph  G  to  be  a  subset  of  the  edges  of  G 
with  the  property  that  every  vertex  of  G  is  an  endpoint  of  at  least  one  of  the  edges  in 
C.  Consider  the  graph  G  shown  in  Figure  28.15.  The  set  of  heavy  edges  is  an  edge 
cover  of  G.The  set  of  circled  vertices  is  a  vertex  cover  of  it. 

Consider  the  problem  of  finding  the  smallest  edge  cover  of  a  graph.  We  can  convert 
this  to  u  decision  problem  by  defining  the  following  language: 

EDGE-C  OVER  —  { <G.  k>‘.  G  is  an  undirected  graph  and  there  exists  an  edge 
cover  of  G  that  contains  at  most  k  edges}. 

EDGE-COVER  can  be  shown  to  be  in  P.  (We  leave  it  as  an  exercise.)  But  we  have 
proven  that  the  corresponding  vertex-cover  language  is  NP-complete: 

•  VERT  EX-C  OVER  -  ( <G.  k>:  G  is  an  undirected  graph  and  there  exists  a  ver¬ 
tex  cover  of  G  that  contains  at  most  k  vertices}. 


FIGURE  28.15  An  edge  cover 
and  a  vertex  cover. 
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28.7.6  Three  Similar  Map  (Graph)  Coloring  Problems 

Consider  the  problem  of  coloring  a  planar  map  in  such  a  way  that  no  two  adjacent  re¬ 
gions  (countries,  states,  or  whatever)  have  the  same  color.  We  will  allow  two  regions 
that  share  only  a  single  common  point  to  have  the  same  color.  So  all  of  the  map  color¬ 
ings  shown  in  Figure  28.16  are  allowed. 

We  ll  say  that  a  map  is  n-colorable  or  that  it  can  he  colored  using  n  colors  iff  it  can 
be  colored,  according  to  the  rule  given  above,  using  no  more  than  n  distinct  colors.  Now 
define  the  following  three  languages: 

•  2-COLORABLE  =  {<w> :  m  is  a  2-colorable  map}. 

•  3-COLORABLE  =  {<wi>  :  m  is  a  3-colorable  map}. 

•  4-COLORABLE  =  {<m>  :  m  is  a  4-colorable  map}. 

What  is  the  complexity  of  each  of  these  three  languages? 

2- COLORABLE  is  easy.  A  map  is  2-colorable  iff  it  does  not  contain  any  point 
that  is  the  junction  of  an  odd  number  of  regions.  We  leave  the  proof  of  this  claim  as 
Exercise  28.21.  (The  proof  of  a  related  claim  is  given  as  Exercise  A.22.)  Map  (a) 
below  is  2-colorable.  Maps  (b)  and  (c)  are  not.  There  is  a  simple,  polynomial-time 
algorithm  to  check  this  requirement.  So  2-COLORABLE  is  in  P. 

3- COLORABLE  □  is  harder.  It  can  be  shown  to  be  NP-complele.  We  leave  the 
proof  of  this  claim  as  an  exercise. 

What  about  4-COLORABLE?  It  turns  out  that  4-COLORABLE  is  in  P.  It  can  be 
decided  by  the  trivial  algorithm  that  simply  accepts  any  map  that  it  is  given.To  see  why, 
we  ll  sketch  the  history  of  the  4-color  problem  Q. 

In  1852,  Francis  Guthrie  noticed  that  he  could  color  all  the  maps  he  was  working 
with  using  only  four  colors.  He  asked  the  question,  "Can  all  planar  maps  be  colored 
(following  the  rules  described  above)  using  at  most  four  colors?”  For  over  a  hundred 
years,  the  answer  to  this  question  eluded  children  and  mathematicians  alike.  All  at¬ 
tempts  to  find  uncolorable  maps  failed.  Yet  neither  was  there  a  proof  of  the  4-color 
theorem:  the  claim,  articulated  by  Guthrie,  that  no  such  map  exists.  A  few  “proofs” 
were  published,  but  all  were  shown  to  contain  flaws. 

Then,  in  1976,  a  proof  that  has  stood  the  test  of  time  was  announced  by  Kenneth 
Appel  and  Wolfgang  Haken.  Interestingly  (since  we  are  discussing  computation),  a 
computer  program  played  a  key  role  in  the  development  of  that  proof.  Appel  and 
Haken  showed  that  the  question  of  whether  all  maps  are  4-colorable  could  be  reduced 
to  a  set  of  about  17(X)  special  cases.  So  it  remained  to  check  all  of  them  and  show  that 
the  maps  in  each  case  were  4-colorable.  Appel  and  Haken  used  a  computer  to  do  that. 


FIGURE  28.16  Legal  map  colorings. 
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When  their  proof  was  published,  there  was  some  concern  about  the  use  of  a  program 
as  part  of  a  proof.  What  if,  for  example,  the  program  were  incorrect?  In  the  years  since 
the  Appel  and  Haken  proof  was  published,  no  programming  errors  have  been  discov¬ 
ered.  Newer,  simpler  proofs  have  also  been  found. 

One  reason  that  the  4-color  problem  is  important  is  that  the  coloring  question  ap¬ 
plies  not  just  to  maps.  It  applies  to  a  wide  range  of  problems  that  can  be  described  as 
graphs.  To  see  why,  notice  first  that  a  map  can  be  described  as  an  undirected  graph  in 
which  the  vertices  correspond  to  regions  and  the  edges  correspond  to  the  adjacency  re¬ 
lationships  between  regions.  So  there  will  be  an  edge  between  vertices  Uj  and  v2  iff  the 
regions  that  correspond  to  V\  and  v2  share  a  common  boundary  in  the  graph.  Then  the 
map  coloring  problem  becomes  the  following  graph  coloring  problem:  Given  a  graph 
G,  assign  colors  to  the  vertices  of  G  in  such  a  way  that  no  pair  of  adjacent  vertices  are 
assigned  the  same  color.  We  can  define  graph  equivalents  of  the  three  coloring  lan¬ 
guages  that  we  defined  above. 

We  will  define  the  chromatic  number  of  a  graph  to  be  the  smallest  number  of  colors 
required  to  color  its  vertices,  subject  to  the  constraint  that  no  two  adjacent  vertices 
may  be  assigned  the  same  color.  In  the  specific  case  in  which  a  graph  has  a  chromatic 
number  of  two,  we’ll  say  that  the  graph  is  bipartite. 

The  4-color  theorem  tells  us  that  the  chromatic  number  of  any  planar  graph  (i.e., 
one  that  corresponds  to  a  map  on  a  plane)  must  be  less  than  five.  (More  precisely,  a 
graph  is  planar  iff  it  can  be  drawn  in  such  a  way  that  no  edges  cross.)  But,  if  we  do  not 
require  planarity,  there  are  graphs  of  arbitrary  chromatic  numbers.  In  particular,  any 
complete  graph  (i.e.,  one  in  which  there  is  an  edge  between  every  pair  of  vertices)  with 
k  vertices  has  the  chromatic  number  k.  Define  the  following  language: 

CHROMATIC-NUMBER  =  {<G,k>  :G  is  an  undirected  graph  whose  chro¬ 
matic  number  is  no  more  than  A}. 

CHROMATIC-NUMBER  is  NP-complete. 


Many  optimization  problems  can  be  described  as  graph-coloring  problems. 

We  mention  two  here: 

•  Consider  the  problem  of  scheduling  final  exams  in  such  a  way  that  no  two 
classes  that  have  any  common  students  share  an  exam  time.  We  can  repre¬ 
sent  the  problem  as  a  graph  in  which  there  is  a  vertex  for  each  class. There  is 
an  edge  between  every  pair  of  classes  that  share  at  least  one  student.  Then 
the  number  of  required  exam  slots  is  the  chromatic  number  of  that  graph. 

•  Consider  the  problem  of  assigning  trains  to  platforms.  Clearly  no  two 
trains  can  be  assigned  to  the  same  platform  at  the  same  time.  We  can  rep¬ 
resent  the  problem  as  a  graph  in  which  there  is  a  vertex  for  each  train. 
There  is  an  edge  between  every  pair  of  trains  that  are  scheduled  to  be  in 
the  station  at  the  same  time.  Then  the  number  of  required  platforms  is  the 
chromatic  number  of  that  graph. 
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Note  that  CHROMATIC-NUMBER  and  INDEPENDENT-SET  are  related. 
CHROMATIC-NUMBER  relates  a  graph  G  to  the  number  of  distinct  colors  that  are 
required  to  color  it.  INDEPENDENT-SET  relates  G  to  the  largest  number  of  vertices 
that  can  be  colored  with  a  single  color.  So.  for  example,  if  the  exam  scheduling  problem 
were  described  as  an  instance  of  INDEPENDENT-SET.  we'd  be  asking  about  the 
maximum  number  of  classes  that  could  share  a  single  exam  time. 


28.7.7  Two  Similar  Linear  Programming  Problems: 

Linear  programming  problems  □  are  optimization  problems  in  which  both  the  objec¬ 
tive  function  and  the  constraints  that  must  be  satisfied  are  linear.  We  can  cast  the  linear 
programming  problem  as  a  language  to  be  decided  by  defining: 

LINEAR-PROGRAMMING  =  {<a  set  of  linear  inequalities  At  ^  h>  :  there 
exists  a  vector  X  of  rational  numbers  that  satisfies  all  of  the  inequalities}. 


Linear  programming  is  used  routinely  to  solve  industrial  resource  allocation 
problems. 


The  simplex  algorithm,  invented  by  George  Dantzig  in  1947.  solves  linear  program¬ 
ming  problems  (by  finding  the  vector  X  if  it  exists).  In  the  worst  case,  it  may  require  ex¬ 
ponential  time.  But,  in  practice,  it  is  highly  effective  and  substantial  work  over  the 
years  since  its  invention  has  further  improved  its  performance.  For  example,  we  men¬ 
tioned  in  the  introduction  to  Chapter  27  that  it  can  be  used  to  solve  large  instances  of 
the  traveling  salesman  problem.  Without  a  decision  procedure  that  could  be  guaran¬ 
teed  to  halt  in  polynomial  time,  however,  the  question  of  whether  LINEAR- 
PROGRAMMING  was  in  P  remained  open.  In  1979.  Leonid  Khachian  answered  the 
question  by  exhibiting  a  new,  polynomial  time,  linear-programming  algorithm.  Unfor¬ 
tunately,  his  algorithm  performed  worse  in  practice  than  did  the  simplex  algorithm,  so 
it  remained  of  only  theoretical  interest.  Then,  in  I9N4.  Narendra  Karmarkar  described 
a  polynomial-time,  linear-programming  algorithm  |  Karmarkar  19H4J  that  works  well  in 
practice.  Both  the  simplex  algorithm  and  a  variety  of  techniques  based  on  Karmarkar’s 
algorithm  arc  commonly  used  today. 

But  now  consider  a  slightly  different  problem  in  which  we  require  that  a  solution  be 
a  vector  of  integers  (as  opposed  to  arbitrary  rationals).  We  can  describe  this  problem  as 
the  language: 

INTEGER-PROGRAMMING  =  { <a  set  of  linear  inequalities  At  s  h>  ; 

there  exists  an  integer  vector  X  that  satisfies  all  of  the  inequalities}. 

INTEGER-PROGRAMMING  is  known  to  be  NP-complete. 
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28.7.8  A  Hierarchy  of  Diophantine  Equation  Problems 

A  Diophantine  equation  is  a  polynomial  equation  in  any  number  of  variables,  all  with 
integer  coefficients.  A  Diophantine  problem  then  is,  “Given  a  system  of  Diophantine 
equations,  does  it  have  an  integer  solution?”  Depending  on  the  restrictions  that  are 
imposed  on  the  form  of  a  particular  problem,  it  may  be  undecidable,  decidable  but  in¬ 
tractable,  or  tractable  (i.e.,  decidable  in  polynomial  time). 

•  The  general  Diophantine  problem  is  undecidable,  as  we  saw  in  Section  22.1. 

•  If  the  problem  is  restricted  to  equations  of  the  form  ax2  +  by  =  c,  where  a,  b,  and 
c  are  positive  integers  and  we  ask  whether  there  exist  integer  values  of  x  and  y  that 
satisfy  the  equation,  then  the  problem  becomes  decidable.  But  it  is  NP-complete. 

•  If  the  problem  is  restricted  to  systems  in  which  all  the  variables  are  of  degree  (ex¬ 
ponent)  1  or  to  equations  of  a  single  variable  of  the  form  ox*  =  c,  and  again  we  ask 
for  integer  values  of  the  variable(s),  then  it  is  in  P. 


28.8  The  Language  Class  Co-NP* 

Given  a  language  L  that  is  in  NP,can  we  say  anything  about  whether  -.L  is  also  inNP? 
Recall  that  we  are  defining  the  complement  of  a  language  to  be  taken  with  respect  to 
the  universe  of  strings  with  the  correct  syntax  whenever  it  is  possible  to  determine  that 
in  polynomial  lime.  So,  for  example,  -.TSP-DECIDE  =  {m>  of  the  form:  <G,cosl>, 
where  <G>  encodes  an  undirected  graph  with  a  positive  distance  attached  to  each  of 
its  edges  and  <G>  does  not  contain  a  Hamiltonian  circuit  whose  total  cost  is  less  than 
cost}.  Is  -.TSP-DECIDE e NP?  It  is  not  obviously  so.  For  example,  the  simple  tech¬ 
nique  we  used  to  prove  Theorem  28.1  (that  the  class  P  is  closed  under  complement) 
won’t  work  here.  We  cannot  simply  swap  accepting  and  nonaccepting  states  since,  if 
there  were  some  accepting  paths  and  some  rejecting  paths,  there  would  then  still  be 
some  accepting  paths  and  some  rejecting  ones.  So  the  new  machine  would  accept  some 
strings  that  are  also  accepted  by  the  original  one.  Because  the  decidable  languages  are 
closed  under  complement,  we  know  that  we  can  build  a  Thring  machine  to  decide 
-VT SP-DEC1DE.  But  the  obvious  way  to  do  so  requires  that  we  explore  all  candidate 
paths  in  order  to  verify  that  none  of  them  is  acceptable.  Since  the  number  of  candidate 
paths  is  0(l<G>|!),  we  cannot  do  that  in  polynomial  time.  No  alternative  approach  is 
known  to  do  significantly  better.  In  other  words,  no  nondeterministic  polynomial  time 
algorithm  to  decide  -.TSP-DECIDE  is  known. 

In  order  to  have  a  place  to  put  -.TSP-DECIDE,  we  define  the  class  co-NP  (i.e.,  the 
complement  of  some  element  of  NP)  as  follows: 

The  Class  co-NP:  L  e  co-NP  iff  -,L  e  NP. 

Another  way  to  think  about  the  relationship  between  NP  and  co-NP  is  the  following: 

•  A  language  L  is  in  NP  iff  a  qualifying  certificate,  i.e.,  one  that  proves  that  an  input 
string  w  is  in  L,  can  be  checked  efficiently. 

*  A  language  L  is  in  co-NP  iff  a  disqualifying  certificate,  i.e.,  one  that  proves  that  an 
input  string  w  is  not  in  L,  can  be  checked  efficiently.  For  example,  a  string  of  the 
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form  <G\  cost>  is  not  in  -.TSP-DECIDE  if  there  exists  even  one  Hamiltonian  cir¬ 
cuit  through  G  whose  cost  is  less  than  cost.  Checking  such  a  proposed  circuit  can 
easily  be  done  in  polynomial  time. 


EXAMPLE  28.2  Two  Co-NP  Languages:  UNSAT  and  VALID 

Two  important  languages  based  on  properties  of  Boolean  formulas  are  in  co-NP: 

•  UNSAT  =  {<w» :  w  is  a  wff  in  Boolean  logic  and  jc  is  not  satisfiable}. 
UNSAT  is  the  complement  of  SAT  (since  we  are  taking  complements  with  re¬ 
spect  to  the  universe  of  well-formed  expressions). 

■  VALID  =  { <w>  :  w  is  a  wff  in  Boolean  logic  and  w  is  valid }.  Recall  that  a 
wff  is  valid  (equivalently,  is  a  tautology)  iff  it  is  true  for  all  assignments  of  truth 
values  to  the  variables  it  contains.  So  w  is  valid  iff is  not  satisfiable. Thus  we 
can  determine  whether  a  string  w  is  in  VALID  by  constructing  the  string 
(which  can  be  done  in  constant  time)  and  then  checking  whether  ->w  is  in 
UNSAT. 


No  one  knows  whether  NP  is  closed  under  complement.  In  other  words,  we  do  not 
know  whether  NP  =  co-NP.  For  a  variety  of  reasons,  it  is  generally  believed  that 
NP  ¥*■  co-NP.  We  state  two  such  reasons  in  the  next  two  theorems. 


THEOREM  28.25  If  NP  *  Co-NP  then  P  #  NP 
Theorem:  IfNP  *  co-NP  then  P  y  NP. 

Proof:  From  Theorem  28.1 .  we  know  that  the  class  P  is  closed  under  complement.  If 
P  =  NP.  then  NP  must  also  be  closed  under  complement.  If  NP  *  co-NP  then 
NP  is  not  closed  under  complement.  So  it  cannot  equal  P. 


We  do  not  know  whether  NP  =  co-NP  implies  that  P  -  NP.  It  is  possible  that 
NP  -  co-NP  but  that  that  class  is  nevertheless  larger  than  P. 

THEOREM  28.26  NP  =  Co-NP  Iff  There  is  Some  NP-Complete  Language 
whose  Complement  is  also  in  NP 

Theorem:  NP  =  co-NP  iff  there  exists  some  language  L  such  that  L  is  NP-complete 
and  -i L  is  also  in  NP. 

Proof:  We  prove  the  two  directions  of  the  claim  separately: 

IfNP  —  co-NP  then  there  exists  some  language  I.  such  that  L  is  NP-complete 
and  -JL  is  also  in  NP:  There  exists  at  least  one  language  L  (for  example.  SAT) 
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that  is  NP-complete.  By  definition, ->L  is  in  co-NP.  If  NP  =  co-NP  then  ->L  must 
also  be  in  NP. 

If  there  exists  some  language  L  such  that  L  is  NP-complete  and  -X  is  also  in  NP 
then  NP  =  co-NP:  Suppose  that  some  language  L  is  NP-complete  and  -X  is  also 
in  NP.Then  we  can  show  that  NPC  co-NP  and  co-NP  Q  NP: 

•  NPC  co-NP:  Let  L\  be  any  language  in  NP.  Since,  by  assumption,  L  is 
NP-complete.  there  exists  a  polynomial-time  reduction  R  from  L}  to  L.  R  is 
also  a  polynomial  time  reduction  from  -Xj  to  -X.  Since,  by  assumption,  -X 
is  in  NP.  there  exists  a  nondeterministic  polynomial-time  Turing  machine  M 
that  decides  it.  So  we  can  decide  -Xt  in  nondeterministic  polynomial  time  by 
first  running  R  and  then  running  M.  So  -X,  is  in  NP  and  its  complement,  L\ , 
is  in  co-NP. Thus  every  language  in  NP  is  also  in  co-NP. 

•  co-NP  C  NP:  Let  L{  be  any  language  in  co-NP.  Then  -Xj  is  in  NP.  Since,  by 
assumption.  L  is  NP-complete,  there  exists  a  polynomial-time  reduction  R 
from  -X  |  to  L.  R  is  also  a  reduction  from  L\  to  -L.  Since,  by  assumption,  -X 
is  in  NP.  there  exists  a  nondeterministic  polynomial-time  Turing  machine  M 
that  decides  it.  So  we  can  decide  L\  in  nondeterministic  polynomial  time  by 
first  running  II  and  then  M.  So  L\  is  in  NP.Thus  every  language  in  co-NP  is 
also  in  NP. 

Despite  substantial  effort,  no  one  has  yet  found  a  single  language  that  can  be  proven 
to  be  NP-complete  and  whose  complement  can  be  proven  to  be  in  NP. 

28.9  The  Time  Hierarchy  Theorems,  EXPTIME,  and  Beyond 

To  prove  that  a  language  L  has  an  efficient  decision  procedure,  it  suffices  to  exhibit 
such  a  procedure,  prove  its  correctness,  and  analyze  its  complexity.  In  general,  proving 
that  no  efficient  decision  procedure  exists  is  much  more  difficult.  We  know  however, 
that  there  exist  some  languages  that  are  inherently  hard.  We  know  this  for  two  reasons: 

•  There  exists  a  set  of  hierarchy  theorems  that  show  that  adding  resources  (in  terms 
of  cither  lime  or  space)  increases  the  set  of  languages  that  can  be  decided. 

•  There  exist  some  specific  decidable  languages  that  can  be  shown  to  be  hard  in  the 
sense  that  no  efficient  algorithm  to  decide  them  exists. 

In  the  next  section,  we  II  describe  the  hierarchy  theorems  and  their  implications. 
Then  we  will  define  one  new  (and  larger)  time-complexity  class  and  consider  one  ex¬ 
ample  of  a  naturally-occurring  language  that  can  be  shown  to  be  verv  hard. 


2%' 9.1  Time  Hierarchy  Theorems  * 

There  exist  two  time  hierarchy  theorems.  They  formalize  the  intuitive  notion  that,  as 
we  allow  a  luring  machine  to  use  more  and  more  time,  the  set  of  languages  that  can  be 
decided  grows.  So,  for  any  fixed  time  bound,  there  must  be  decidable  languages  that 
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can  be  decided  within  the  bound  but  that  cannot  be  decided  using  “substantially  less” 
time.  One  of  the  theorems  applies  to  deterministic  Turing  machines;  the  other  applies 
to  nondcterministic  ones.  There  is  also  a  corresponding  pair  of  space  hierarchy  theo¬ 
rems  that  make  the  same  case  for  what  happens  as  the  amount  of  space  that  can  be 
used  grows. 

The  hierarchy  theorems  are  important.  In  particular,  they  tell  us  that,  while  it  is  pos¬ 
sible  that  particular  pairs  of  complexity  classes  may  collapse,  it  is  not  possible  that  all  of 
them  do.  There  are  time  complexity  classes  that  properly  contain  other  ones  (and  sim¬ 
ilarly  for  space  complexity  classes).  Unfortunately,  there  are  two  kinds  of  important 
questions  that  the  hierarchy  theorems  cannot  answer: 

•  They  do  not  tell  us  what  languages  lie  where  in  the  hierarchy. They  arc  proved  by 
diagonalization  so  they  show  only  that  some  language  must  exist.  They  are  not 
constructive. 

•  They  do  not  relate  deterministic  complexity  classes  to  nondcterministic  ones.  So. 
for  example,  they  say  nothing  about  whether  P  =  NP.  They  also  do  not  relate  time 
complexity  classes  to  space  complexity  classes  (such  as  the  ones  we  will  define  in 
the  next  chapter). 

Wc  would  like  to  be  able  to  show  that  any  increase  in  the  amount  of  time  that  is  al¬ 
lowed  increases  the  set  of  languages  that  can  be  decided.  Unfortunately,  we  cannot 
prove  that  that  is  true. The  strongest  statement  that  we  can  prove  is  that  increasing  the 
amount  of  time  by  at  least  a  logarithmic  factor  makes  a  difference. 

We  will  stale  and  prove  the  deterministic  version  of  the  time  hierarchy  theorems. 
The  nondcterministic  version  is  similar.  The  proof  that  we  will  do  will  be  by  construc¬ 
tion  of  a  Turing  machine  that  can  do  the  following  two  things: 

•  Compute  the  value  of  a  limereq  function,  on  a  given  input,  and  store  that  value,  in 
binary,  on  its  tape. 

•  tfficienlly  simulate  another  Turing  machine  for  a  specified  number  of  steps. 

Before  we  state  the  theorem  and  give  its  complete  proof,  we’ll  discuss  how  to  do 
each  of  those  things. 

Time-Constructible  Functions 

Our  goal  will  be  to  show  that,  given  a  function  /(»).  there  exists  some  language 
Li{h)IhmI  that  can  be  decided  in  t(n)  time  but  not  in  "substantially  less”  time.  (We’ll 
soon  see  that  "substantially  less”  will  mean  by  a  factor  of  I/log  i(n).)  So  we  will  want 
to  be  able  to  conduct  a  simulation  for  at  most  /(/t)/log  i(n)  stops.  Wc  could  do  that  if 
we  could  compute  i(n)  and  write  it  on  the  simulator’s  tape.  Then  we  could  divide 
that  number  by  log  r(/i)  and  use  that  number  as  a  counter,  decrementing  it  by  one 
for  each  simulated  step  and  quilting,  even  if  the  simulation  hasn’t  yet  halted,  if  the 
counter  ever  reaches  zero.  Wc  will  need  an  efficient  representation  of  f(n)*s  value. 
We  could  choose  to  use  any  base  other  than  one.  We  will  choose  to  represent  the 
value  in  binary.  So  what  we  need  is  the  ability  to  compute  f(n)  and  store  the  result  in 
binary.  Since  n  is  the  length  of  some  Turing  machine’s  input,  wc  can  think  of  that 
input  as  though  all  of  its  symbols  were  1’s. 
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So  we  can  compute  f(/i)  if  we  can  map  the  string  1"  to  the  binary  representation  of 
/(/>).  We  need  this  computation  not  to  dominate  the  simulation  itself.  So  we  will  require 
that  it  be  able  to  be  done  in  0(t («))  lime.  So  define  a  function  t(n)  from  the  positive  in¬ 
tegers  to  the  positive  integers  to  be  lime-construciible  iff: 

•  i(n)  is  at  least  0{n  log  n).  and 

•  the  function  that  maps  the  unary  representation  of  n  (i.e.,  1")  to  the  binary  repre¬ 
sentation  of  f(/i)  can  be  computed  in  0(i(n))  time. 

Most  useful  functions,  as  long  as  they  are  at  least  0(n  log  n)%  are  time-constructible. 
For  example,  all  polynomial  functions  that  are  at  least  0(n  log  n )  are  time- 
constructible.  So  are  n  log  n.  nVn,  2",  and  n\. 

Efficient  Bounded  Simulation 

The  proof  that  we  are  about  to  do  depends  critically  on  the  ability  to  perform  a  bounded 
simulation  of  one  Thring  machine  by  another  and  to  do  so  efficiently.  Any  overhead  that 
occurs  as  part  of  the  simulation  will  weaken  the  claim  that  we  are  going  to  be  able  to 
make  about  the  impact  of  additional  time  on  our  ability  to  decide  additional  languages 
(because  time  that  gets  spent  on  simulation  overhead  doesn’t  get  spent  doing  real  work). 

The  universal  Turing  machine  that  we  described  in  Section  17.7  simulates  the  compu¬ 
tation  of  an  arbitrary  Turing  machine  M  on  an  arbitrary  input  w.  But  it  uses  three  tapes. 

If  we  simply  convert  that  three-tape  machine  to  a  one-tape  machine  as  described  in 
Section  17.3.1,  then  a  computation  that  took  i(n)  steps  on  the  three-tape  machine  will 
take  C7(/(/i)2)  steps  on  the  corresponding  one-tape  machine.  We  can  do  better.  If  we  look 
again  at  the  way  that  the  construction  of  Section  17.3.1  works,  we  observe  that  the  new, 
one-tape  machine  spends  most  of  its  time  scanning  the  simulated  tapes.  First  it  scans  to 
collect  the  values  under  all  of  the  read/write  heads.  And  then  it  scans  again  to  update 
each  tape  in  ihe  neighborhood  of  its  read/write  head.  The  fact  that  the  length  of  any  of 
the  tapes  may  grow  as  0(r(u))  is  what  adds  the  0(r(/?))  factor  to  the  time  required  by  the 
simulation.  Wc  can  avoid  that  overhead  if  we  can  describe  a  simulator  that  uses  multiple 
tapes  but  that  manages  them  in  such  a  way  that  it  is  no  longer  necessary  to  scan  the 
length  of  each  tape  at  each  step. 

We  are  about  to  describe  a  simulator  BSim  that  does  that.  BSim  also  differs  from 
the  universal  Timing  machine  in  that  it  takes  a  third  parameter,  a  lime  bound  b.  It  will 
simulate  a  machine  M  on  input  w  for  b  steps  or  until  M  halts,  whichever  comes  first 
BSim  is  otherwise  like  the  universal  Turing  machine  that  we  have  already  described.  In 
particular,  we  will  assume  that  the  Turing  machine  that  BSim  simulates  is  encoded  as 
for  the  universal  Turing  machine.  This  assumption  guarantees  that  BSim  can  simulate 
any  Turing  machine,  regardless  of  the  size  of  its  tape  alphabet. 

BSim  accepts  as  input  a  Turing  machine  M ,  an  input  string  w,  and  a  time  bound  b.  It 
uses  a  single  tape  that  is  divided  into  three  tracks.  (As  in  the  construction  in  Section 
17.3.1,  multiple  tracks  can  be  represented  on  a  single  tape  by  using  a  tape  alphabet  that 
contains  one  symbol  for  each  possible  ordered  3-tuple  of  track  values.)  The  three 
tracks  will  be  used  as  follows: 

•  Track  1  will  hold  the  current  value  of  Af’s  tape,  along  with  an  indication  of  where  its 
read/write  head  is. 
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•  Track  2  will  hold  M’s  current  state  followed  by  M's  description  (i.e.,  its  transition 
function). 

•  Track  3  will  hold  a  counter  that  is  initially  set  to  be  the  time  bound  b.  As  each  step 
of  M  is  simulated,  the  counter  will  be  decremented  by  l.The  simulation  will  halt  if 
the  counter  ever  reaches  0  (or  if  M  naturally  halls). 

The  key  to  BSim's  efficiency  is  that  it  will  keep  the  contents  of  its  three  tracks  lined 
up  so  that  it  can  find  what  it  needs  by  examining  only  a  small  slice  through  the  tracks. 
Suppose  that  the  tracks  are  as  shown  in  Figure  2H.17.The  position  of  M’s  read/write 
head  is  shown  as  a  character  in  bold. 

Each  time  it  needs  to  make  a  move.  BSitn  needs  to  check  one  square  on  track  1.  It 
also  needs  to  check  M’s  state  and  it  needs  to  examine  M’s  transition  function  in  order 
to  discover  what  to  do.  Because  of  the  way  that  the  tracks  are  lined  up.  it  can  do  all  of 
these  things  by  scanning  its  tape  starling  in  the  position  shown  in  bold  (i.e..  the  square 
that  corresponds  to  the  current  location  of  M’s  read/write  head).  The  number  of 
squares  that  it  must  examine  on  track  2  is  a  function  of  the  length  of  M’s  description, 
not  the  length  of  its  input  w  or  its  working  tape.  So  BSim  can  determine  M’s  next  move 
in  0(|<M>|)  steps. 

To  make  M's  next  move.  BSim  must  then: 

•  Update  track  1  as  specified  by  M’s  transition  function.  Doing  this  requires  moving 
at  most  one  square  on  track  1.  so  it  takes  constant  lime. 

•  Update  M's  state  on  track  2.  Doing  this  requires  lime  that  is  a  function  of  the  length 
of  the  state  description,  which  is  bounded  by  |<M>|.  So  it  takes  C7(|<M>|)  time. 

•  Move  the  contents  of  track  2  one  square  to  the  right  or  to  the  left,  depending  on 
which  way  M’s  read/write  head  moved.  Doing  this  takes  time  that  is  a  function  only 
of  M.  So  it  also  takes  0(|<M>|)  time. 

All  that  remains  is  to  describe  how  BSim  considers  b,  the  hound  it  has  been  given. 
Track  3  contains  a  counter  that  has  been  initialized  to  a  string  that  corresponds  to  the 
binary  encoding  of  b.  At  each  of  M’s  steps.  BSim  must: 

•  Decrement  the  counter  by  1  and  check  for  0. 

•  Shift  the  counter  left  or  right  one  square  so  that  it  remains  lined  up  with  M’s 
read/write  head.  The  number  of  steps  required  to  do  this  is  a  funclion  of  the  length 
of  the  counter.  The  maximum  value  of  the  counter  is  the  original  bound,  b.  Since 
the  counter  is  represented  in  binary,  its  maximum  length  is  log  b.  So  this  step  takes 
log  b  lime. 

BSim  runs  M  for  no  more  than  h  steps.  Each  step  takes  C>(I<M>|)  time  to  do  the 
computation  plus  C)(log  b)  time  to  manage  the  counter.  So  BSim  can  simulate  b  steps 
ofMin0(Ml<M>|  +  log b)  time. 
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FIGURE  28.17  Lining  up  the  tapes  for  efficiency. 
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The  Deterministic  Time-Hierarchy  Theorem 

The  Deterministic  Time-Hierarchy  Theorem  tells  us  that  changing  the  amount  of  avail¬ 
able  lime  by  a  logarithmic  factor  makes  a  difference  in  what  can  be  done.  As  we'll  see. 
the  logarithmic  factor  comes  from  the  fact  that  the  best  technique  we  have  for  bound¬ 
ed  simulation  (as  described  above)  introduces  a  logarithmic  overhead  factor.  We’ll 
state  the  theorem  precisely  using  both  O  and  a  notation.  Recall  that  /(n)  e  cr(g(n))  iff, 
for  every  positive  c,  there  exists  a  positive  integer  k  such  that  Vn  >  k(f(n)  <  c  g(n)). 

In  other  words,  for  all  but  some  finite  number  k  of  small  values,  f(n)  <  c  g(n). 

THEOREM  28.27  Deterministic  Time  Hierarchy  Theorem 

Theorem:  For  any  time-constructible  function  r(n),  there  exists  a  language  L,(n)hard 
that  is  deterministically  decidable  in  0(/(n))  time  but  that  is  not  deterministically 
decidable  in  (x(/(n)/log  r(n))  time. 

Proof:  To  prove  this  claim,  we  will  present  a  technique  that,  given  a  function  /(«), 
finds  a  language  Ll{n)lwrd  that  has  the  properties  that  we  seek.  We’ll  define 
Ll{n)imr,t  by  describing  a  Turing  machine  that  decides  it  in  0(f(n))  time.  So  the 
first  requirement  will  obviously  be  met. The  only  thing  that  remains  is  to  design  it 
so  that  any  other  TUring  machine  that  decides  it  takes  at  least  r(n )flog  t(n)  time. 
We  ll  use  diagonalization  to  do  that.  In  particular,  we’ll  make  sure  that  the  Turing 
machine  that  decides  Ll{n)lwr,,  behaves  differently,  on  at  least  one  input,  than  any 
Turing  machine  that  runs  in  <r(f(n)flog  f(n))  time. 

Lf(n)liurd  will  be  a  language  of  Turing  machine  descriptions  with  a  simple  string 
consisting  of  a  single  1  and  then  a  string  of  0’s  tacked  on  to  the  right.  More 
specifically,  every  string  in  Ll(n)hard  will  have  the  form  <M >10*.  The  job  of  the 
appendage  is  to  guarantee  that  L  contains  some  arbitrarily  long  strings. 

The  rest  of  the  definition  of  /.,(„>/, ar,i  is  difficult  to  state  in  words.  Instead,  we 
will  define  L/(M)W  by  describing  a  Turing  machine  Mt(n))lard  that  decides  it: 

1.  Let  n  be  |u>|.  Compute  /(w).  Store  the  result,  in  binary,  on  the  tape. 

2.  Divide  that  number  by  log  t(n).  Store  f  f(n)/logf(n)l,  in  binary,  on  the 
tape.  Call  this  number  b. 

3.  Check  to  see  that  w  is  of  the  form  <  Af>10\  If  it  is  not,  reject. 

4.  Check  that  |<Af>|  <  log  b.  If  it  is  not,  reject. 

5.  Reformat  the  tape  into  the  three  tracks  required  by  BSim.  To  do  this, 
leave  w  on  track  1.  Copy  M  s  start  state  and  <M  >  to  track  2  starting  at 
the  left  end  of  w.  Copy  b  to  track  3,  also  starting  at  the  left  end  of  w. 

6.  Run  BSim.  In  other  words,  simulate  M  on  w  (which  is  of  the  form  <M> 
10*)  for  I /(n)/log/(n)l  steps. 

7.  If  M  did  not  halt  in  that  time,  reject. 

8.  If  M  did  halt  and  it  accepted,  reject. 

9.  If  M  did  halt  and  it  rejected,  accept. 
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The  key  feature  of  the  way  that  is  defined  is  the  following:  Whenever 

it  runs  a  simulation  to  completion,  it  does  exactly  the  opposite  of  what  the  ma¬ 
chine  it  just  simulated  would  have  done. 

We  need  to  show  that  Ll{n)hard.  the  language  accepted  by  can  be  de¬ 

cided  in  0(1  («))  time  and  that  it  cannot  be  decided  (by  some  other  Turing  ma¬ 
chine)  in  )/log  r(/i))  lime. 

We  ll  first  show  that  M,lH)/wri/  runs  in  0(i(n))  time.  In  a  nutshell,  on  input 
<M ,  10*>,  M,(„)hard  uses  'ts  time  lo  simulate  /(«)/log  f(ti)  steps  of  M.  using 
C>(log /(/i))  time  for  each  one.  We  can  analyze  it  in  more  detail  as  follows:  Step  1 
can  be  done  in  time  since  /(//)  is  time-conslructible.  Step  2  can  also  be 

done  in  0(i(n))  time.  Step  3  can  be  done  in  linear  time  if  we  just  check  the  most 
basic  syntax.  It  isn’t  necessary,  for  example,  to  make  sure  that  all  the  stales  in  M 
are  numbered  sequentially,  even  though  our  description  of  our  encoding  scheme 
specifies  that.  Step  4  can  be  done  in  OU(it))  time.  The  point  of  this  check  is  to 
make  sure  that  the  cost  of  running  the  simulation  in  step  h  is  dominated  by  the 
total  length  of  w,  not  by  the  length  of  <M >.  Step  5  can  be  done  in  linear  time. 

The  core  of  is  step  6.  On  input  (M.  u\  b),  HSim  requires  0(b‘ 

(  <M>|  +  log  b))  time.  But  we  have  guaranteed  (in  step  4)  that  |<M>|  <  logb. 
So  BSim  requires  0(b  log  b)  time.  We  set  b.  the  number  of  steps  to  be  simulated, 
to  r(«)/log  i(n).  Each  simulation  step  will  take  C>(log  /(«))  time.  So  the  total  sim¬ 
ulation  time  will  be  C?(f(/j)).  Giving  a  bit  more  detail,  notice  that,  since  b  is 
f(«)/log  t(n),  we  have: 


limereq(  BSim)  e 


i 


t(n)  •  log(/( <i)/log /(/>)) \ 
log/(/»)  ) 


Since  t(n)  >  1.  we  have  that  timen*q(BSim)  eO(l(n)).  Steps  7. 8.  and  9  take 
constant  time.  So  M tlll)l,liril  runs  in  0(i(n))  lime. 

Now  we  must  show  that  there  is  noolherTuring  machine  that  decides  Ll(n)hlird 
substantially  more  efficiently  than  M,{n)hntd  does.  Specifically,  we  must  show  that 
no  such  machine  docs  so  in  time  that  is  fr(/(»)/log  /(/i )).  Suppose  that  there  were 
such  a  machine.  We’ll  call  it  M,{n)einr  For  any  constant  c.  M,[nkl,%v  must,  on  all  in¬ 
puts  of  length  greater  than  some  constant  k,  halt  in  no  more  than  c  •  r(/r)/log  r(w) 
steps.  So.  in  particular,  we  can  let  c  be  1  .Then,  on  all  inputs  of  length  greater  than 
some  constant  k,  Mt(nkmy  must  hall  in  fewer  than  f(/i)4og  /(/i )  steps. 

What  we  are  going  to  do  is  to  show  that  M, is  not  in  fact  a  decider  for 
Ll{„)hurii  because  it  is  not  equivalent  to  M, We  can  do  that  if  we  can  show 
even  one  string  on  which  the  two  machines  return  different  results. That  string  will 
be  ic  =  <M,(„)ra*y>  10'\  for  a  particular  value  of  p  that  we  will  choose  so  that: 


•  is  short  relative  to  the  entire  length  of  ir.  Let  n  be  l<A/<(nk,(UV> 

lO'l.  Then,  more  specifically.it  must  be  the  case  that  l< log(r(n)/ 

log /(»)).  We  require  this  so  that  Ml{n)h0ni  will  not  reject  in  step  4.  Remember 
that  Mn n Mwi  checks  for  this  condition  in  order  lo  guarantee  that,  when  BSim 
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runs,  the  overhead,  at  each  step,  of  managing  the  counter  dominates  the  overhead 
of  scanning  M’s  description.  Let  m  be  l<M,(n)t.ajv>  |.  Then  this  condition  will  be 
satisfied  if  p  is  at  least  22>\  (We  leave  as  an  exercise  the  proof  that  this  value 
works.) 

•  |w>|  >  k.  On  input  w  =  <M,,„)t.<MV>  IIP,  Ml{n)hard  will  simulate  M,(w)rMV  on  w 
for  r (|u'|  )/log  t(\w\)  steps.  For  inputs  of  length  at  least  k ,  M,{n)tajy  is  guaranteed 
to  halt  within  that  many  steps. That  means  that  M,(„)/mrrf  will  do  exactly  the  op¬ 
posite  of  what  Mu„)casj  does.  Thus  the  two  machines  are  not  identical.  This 
condition  is  satisfied  if  p  is  at  least  k. 

So  let  p  be  the  larger  of  k  and  22”.  On  input  w  =  <M,^eiuy>  10/7,  the  simu¬ 
lation  of  Mun^any  on  <M,(w)t,aj)>  lO^  will  run  to  completion.  If  M,(n),aiv  accepts, 
rejects.  And  vice  versa.  This  contradicts  the  assumption  that  Mlin)easy 
decides  L^n^lurd. 


One  consequence  of  the  Deterministic  Time  Hierarchy  Theorem  is  the  claim  that 
we  made  at  the  beginning  of  this  chapter,  namely  that  the  polynomial  time  complexity 
classes  do  not  collapse.  There  are  languages  that  are  deterministically  decidable  in 
O(tr)  lime  but  not  in  linear  time.  And  there  are  languages  that  are  deterministically 
decidable  in  0(/ i2iKM>)  but  not  in  0(/tll,,w)  time.  So  there  are  languages  that  are  in  P  but 
that  are  not  tractable  in  any  useful  sense. 

Another  consequence  is  that  there  are  languages  that  are  deterministically  decid¬ 
able  in  exponential  time  but  not  in  polynomial  time. 


.9.2  EXPTIME 

In  Section  28.5.1,  we  suggested  that  there  are  languages  that  are  NP-hard  but  that  can¬ 
not  be  shown  to  be  NP-complete  because  they  cannot  be  shown  to  be  in  NP.The  exam¬ 
ple  that  we  mentioned  was: 

•  CHESS  =  { <b>:  b  is  a  configuration  of  an  n  X  n  chess  board  and  there  is  a  guar¬ 
anteed  win  for  the  current  player}. 

We  can  describe  the  complexity  of  CHESS,  other  “interesting"  games  like  Go,  and 
many  other  apparently  very  difficult  languages,  by  defining  the  class  EXPTIME  as 
follows: 

The  Class  EXPTIME:  For  any  language  L,  L  e  EXPTIME  iff  there  exists  some 
deterministic  Turing  machine  M  that  decides  L  and  timereq{M)eO( 2(,|i))  for 
some  positive  integer  k. 

We  show  that  a  language  is  in  EXPTIME  by  exhibiting  an  algorithm  that  decides  it 
in  exponential  time.  We  sketch  such  an  algorithm  for  chess  (and  other  two  person 
games)  in  N.2.5.  In  general,  if  we  can  describe  an  algorithm  that  decides  L  by  exploring 
all  of  the  paths  in  a  tree  whose  size  grows  exponentially  with  the  size  of  the  input,  then 
L  is  in  EXPTIME. 
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As  we  did  for  the  class  NP.  we  can  define  a  class  of  equally  hard  EXPT1ME  lan¬ 
guages.  So  we  consider  two  properties  that  a  language  L  might  possess: 

1.  /.is  in  EXPTIME. 

2.  Every  language  in  EXPTIME  is  deterministic,  polynomial-time  reducible  toL. 

We'll  say  that  a  language  is  EXPTIME-hard  iff  it  possesses  property  2.  If.  in  addition, 
it  possesses  property  1.  we  ll  say  that  it  is  EXPT1 M E-complefe.  In  N.2.3,  we’ll  return  to 
a  discussion  of  the  complexity  of  CHESS.  If  we  make  the  assumption  that,  as  we  add 
rows  and  columns  to  the  chess  board,  we  also  add  pieces,  then  CHESS  can  be  shown  to 
be  EXPTIME-complele. 

In  Section  29.2.  we  will  define  another  important  complexity  class,  this  time  based  on 
space,  rather  than  time,  requirements. The  class  PSPACE  contains  exactly  those  languages 
that  can  be  decided  by  a  deterministic  Turing  machine  w  hose  space  requirement  grows  as 
some  polynomial  function  of  its  input.  We  can  summarize  what  is  known  about  the  space 
complexity  classes  P.  NP.  and  EXPTIME,  as  well  as  the  space  complexity  class  PSPACE  as 
follows: 

•  P  C  NP  C  PSPACE  C  EXPTIME. 

It  is  not  known  which  of  these  inclusions  is  proper.  However,  it  follows  from  the  De¬ 
terministic  Time  Hierarchy  Theorem  that  P  *  EXPTIME.  So  at  least  one  of  them  is.  It 
is  thought  that  all  of  them  are. 

A  consequence  of  the  fact  that  P  *  EXPTIME  is  that  we  know  that  there  are  decid¬ 
able  problems  for  which  no  efficient  (i.e..  polynomial  lime)  decision  procedure  exists.  In 
particular,  this  must  be  true  for  every  EXPTIME-complete  problem.  So.  for  example, 
CHESS  is  provably  intractable  in  the  sense  that  no  polynomial-time  algorithm  for  it 
exists.  Practical  solutions  for  EXPTIME-complele  problems  must  exploit  techniques 
like  the  approximation  algorithms  that  we  describe  in  Chapter  30. 


28.9.3  Harder  Than  EXPTIME  Problems 

Some  problems  are  even  harder  than  the  EXPIlME-complete  problems,  such  as 
CHESS.  We  will  mention  one  example. 

Recall  that,  in  Section  22.4.2.  we  proved  that  the  language  FOl.,h(.„rcm  =  { <A.  w>  :  A 
is  a  decidable  set  of  axioms  in  first-order  logic,  tr  is  a  sentence  in  first-order  logic,  and  w 
is  entailed  by  A}  is  not  decidable. The  proof  relied  on  the  fact  that  there  exists  at  least 
one  specific  first-order  theory  that  is  not  decidable.  In  particular,  it  relied  on  the  theory 
of  Peano  arithmetic,  which  describes  the  natural  numbers  with  the  functions  plus  and 
times. 

The  fact  that  not  all  first-order  theories  are  decidable  does  not  mean  that  none  of 
them  is.  In  particular,  we  have  mentioned  the  theory  of  Presburger  arithmetic,  a  theory 
of  the  natural  numbers  with  just  the  function  plus.  Presburger  arithmetic  is  decidable. 
Unfortunately,  it  is  intractable.  [Fischer  and  Rahin  1974)  showed  that  any  algorithm 
that  decides  whether  a  sentence  is  a  theorem  of  Presburger  arithmetic  must  have  time 
complexity  at  least  0(2" "). 
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28.10  The  Problem  Classes  FP  and  FNP  • 

Recall  that: 

•  A  language  L  that  corresponds  to  a  decision  problem  Q  is  in  P  iff  there  is  deterministic 
polynomial  time  algorithm  that  decides,  given  an  arbitrary  input  x,  whether  xeL. 

•  A  language  L  that  corresponds  to  a  decision  problem  Q  is  in  NP  iff  there  is  a  de¬ 
terministic  polynomial  time  verifier  that  decides,  given  an  arbitrary  input  x  and  a 
certificate  c,  whether  c  is  a  certificate  for  x.  Equivalently,  L  is  in  NP  iff  there  is  a 
nondclerministic  polynomial  time  algorithm  that  decides,  given  an  arbitrary  input 
jr,  whether  there  exists  a  certificate  for  x. 

Now  suppose  that,  instead  of  restricting  our  attention  to  decision  problems,  we  wish 
to  be  able  to  characterize  the  complexity  of  functions  whose  result  may  of  any  type  (for 
example,  the  integers).  What  we'll  actually  do  is  to  go  one  step  farther,  and  define  the 
following  complexity  classes  for  arbitrary  binary  relations. 

The  Class  FP:  A  binary  relation  Q  is  in  FP  iff  there  is  deterministic  polynomial 
time  algorithm  that,  given  an  arbitrary  input  jc,  can  find  some  y  such  that  (.t,  y )  e  Q. 

The  Class  FNP:  A  binary  relation  Q  is  in  FNP  iff  there  is  a  deterministic  polyno¬ 
mial  time  verifier  that,  given  an  arbitrary  input  pair  (jc.  y),  determines  whether 
(jr,  y)  e  Q •  Equivalently,  Q  is  in  FNP  iff  there  is  a  nondeterministic  polynomial  time 
algorithm  that,  given  an  arbitrary  input  x,  can  find  some  y  such  that  (jc,  y)  e  Q. 

FP  is  the  funclional/relational  analog  of  P:  If  a  relation  Q  is  in  FP  then  it  is  possible, 
in  deterministic  polynomial  time,  given  a  value  x,  to  find  a  value  y  such  that  (jc,  y)  is 
in  Q.  FNP  is  the  functional/relational  analog  of  NP:  If  a  relation  Q  is  in  FNP  then  it  is 
possible,  in  deterministic  polynomial  time,  to  determine  whether  a  particular  ordered 
pair  (x.y)  is  in  Q. 

As  before,  checking  all  values  is  at  least  as  hard  as  checking  a  single  value.  So  we  have: 

FP  C  FNP. 

But  are  they  equal?  The  answer  is  that  FP  =  FNP  iff  P  =  NP. 

In  Section  28.5,  we  said  that  a  language  is  NP-hard  iff  all  other  languages  in  NP  are 
deterministic,  polynomial  time  reducible  to  it.  It  is  also  common  to  apply  the  term  “NP- 
hard  to  functions.  In  this  case,  we  11  say  that  a  function  is  NP-hard  iff  its  corresponding 
decision  problem  is  NP-hard.  So,  for  example: 

•  The  language  TSP-DECIDE  —  {<-G,cost>  :  <G>  encodes  an  undirected  graph 
with  a  positive  distance  attached  to  each  of  its  edges  and  G  contains  a  Hamilton¬ 
ian  circuit  whose  total  cost  is  less  than  coat}  is  NP-complete  (and  thus  NP-hard). 
So  the  function  that  determines  the  cost  of  the  lowest  cost  Hamiltonian  circuit  in 
G  is  NP-hard. 

•  Recall  that  the  chromatic  number  of  a  graph  is  the  smallest  number  of  colors 
required  to  color  its  vertices,  subject  to  the  constraint  that  no  two  adjacent  vertices 

ma>  lr6i  JhC  Same  Color  We  defined  lhe  language  CHROMATIC- 
‘ ’  G  is  an  undirected  graph  whose  chromatic  number  is  no 
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more  than  k).  It  is  NP-complete.  So  the  function  that  maps  a  graph  to  its  chromat¬ 
ic  number  is  NP-hard. 

There  are,  however  problems  for  which  the  decision  version  (i.e.,  a  language  to  be 
decided)  is  easy,  yet  the  function  version  remains  hard.  Probably  the  most  important  of 
these  is  the  following: 

•  The  language  PRIMES  =  {w :  w  is  the  binary  encoding  of  a  prime  number}  is  in 
P.  But  the  problem  of  finding  the  factors  of  a  composite  number  has  no  known 
polynomial  time  solution. 


Exercises 

1.  In  Section  28.1.5,  we  described  the  Seven  Bridges  of  Konigsberg  problem.  Con¬ 
sider  the  following  modification: 


The  good  prince  lives  in  the  castle.  He  wants  to  be  able  to  return  home  from 
the  pub  (on  one  of  the  islands  as  shown  above)  and  cross  every  bridge  exactly 
once  along  the  way.  But  he  wants  to  make  sure  that  his  evil  twin,  who  lives  on  the 
other  river  bank,  is  unable  to  cross  every  bridge  exactly  once  on  his  way  home 
from  the  pub.  The  good  prince  is  willing  to  invest  in  building  one  new  bridge  in 
order  to  make  his  goal  achievable.  Where  should  he  build  his  bridge? 

2.  Consider  the  language  NONEULERIAN  =  { <G>  :G  is  an  undirected  graph 
and  G  does  not  contain  an  Eulerian  circuit}. 

a.  Show  an  example  of  a  connected  graph  with  8  vertices  that  is  in  NON¬ 
EULERIAN. 

b.  Prove  that  NONEULERIAN  is  in  P. 

3.  Show  that  each  of  the  following  languages  is  in  P. 

a.  WWW  =  {www\  we  (a,b}*} 

b.  {  < Af ,  tt’>  :  Turing  machine  M  halts  on  w  within  3  steps } 

c.  EDGE-COVER  =  {<G,  k>:G  is  an  undirected  graph  and  there  exists  an 
edge  cover  of  G  that  contains  at  most  k  edges} 

4.  In  the  proof  of  Theorem  B.2,  we  present  the  algorithm  3 -conjunctive Boolean^ 
which,  given  a  Boolean  wff  w,  constructs  a  new  wff  w\  where  w‘  is  in  3-CNF. 
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a.  We  claimed  that  w'  is  satisfiable  iff  w  is.  Prove  that  claim. 

b.  Prove  that  3-conjunctiveBoolean  runs  in  polynomial  time. 

5.  Consider  the  language  2-SAT  —  {<w>  :  w  is  a  wff  in  Boolean  logic,  w  is  in 
2-conjunctive  normal  form  and  w  is  satisfiable). 

a.  Prove  that  2-SAT  is  in  P.  (Hint:  Use  resolution,  as  described  in  B.I.2.). 

b.  Why  cannot  your  proof  from  part  a  be  extended  to  show  that  3-SAT  is  in  P? 

c.  Now  consider  a  modification  of  2-SAT  that  might,  at  first,  seem  even  easi¬ 
er,  since  it  may  not  require  all  of  the  clauses  of  w  to  be  simultaneously  sat¬ 
isfied.  Let  2-SAT-MAX  =  (<u>,  k>  :  w  is  a  wff  in  Boolean  logic,  w  is  in 
2-conjunctive  normal  form.  1  s  k  ^  |C|,  where  |C|  is  the  number  of  clauses 
in  w ,  and  there  exists  an  assignment  of  values  to  the  variables  of  w  that  si¬ 
multaneously  satisfies  at  least  k  of  the  clauses  in  ic).  Show  that  2-SAT-MAX 
is  NP-complete. 

6.  In  Chapter  9,  we  showed  that  all  of  the  questions  that  we  posed  about  regular 
languages  are  decidable.  We *11  see,  in  Section  29.3.3,  that  while  decidable,  some 
straightforward  questions  about  the  regular  languages  appear  to  be  hard.  Some 
are  easy  however.  Show  that  each  of  the  following  languages  is  in  P. 

a.  DFSM-ACCEPT  =  {<M,  w>  :  M  is  a  DFSM  and  w  e  L(M)} 

b.  FSM-EMPTY  =  {<M>  :  M  is  a  FSM  and  L(M)  =  0} 

c.  DFSM-ALL  =  {<M>  :  Af  is  a  DFSM  and  L(M)  =  2*} 

7.  We  proved  (in  Theorem  28.1)  that  P  is  closed  under  complement.  Prove  that  it  is 
also  closed  under: 

a.  union. 

b.  concatenation. 

c.  Kleene  star. 

8.  It  is  not  known  whether  NP  is  closed  under  complement.  But  prove  that  it  is 
closed  under: 

a.  union. 

b.  concatenation. 

c.  Kleene  star. 

9.  If  L\  and  are  in  P  and  L\  C  L  C  L2,  must  L  be  in  P?  Prove  your  answer. 

10.  Show  that  each  of  the  following  languages  is  NP-complete  by  first  showing  that  it 
is  in  NP  and  then  showing  that  it  is  NP-hard. 

a.  CLIQUE  =  {<G,  k>  :  G  is  an  undirected  graph  with  vertices  V  and  edges 
E,k  is  an  integer,  Isis  |V|,  and  G  contains  a  k-clique). 

b.  SUBSET-SUM  =  { <S,  k  >  :  S  is  a  multiset  (i.e.,  duplicates  are  allowed)  of  in¬ 
tegers,  k  is  an  integer,  and  there  exists  some  subset  of  S  whose  elements  sum  to  k) 

c.  SET-PARTITION  =  {<S>  :  S  is  a  multiset  (i.e.,  duplicates  are  allowed)  of 
objects,  each  of  which  has  an  associated  cost,  and  there  exists  a  way  to  divide 
S  inlo  two  subsets,  A  and  S  —  A,  such  that  the  sum  of  the  costs  of  the  ele¬ 
ments  in  A  equals  the  sum  of  the  costs  of  the  elements  in  S  -  A). 

d.  KNAPSACK  -  (<S.  y,  c>  :  S  is  a  set  of  objects  each  of  which  has  an  associ¬ 
ated  cost  and  an  associated  value,  v  and  c  are  integers,  and  there  exists  some 
way  of  choosing  elements  of  S  (duplicates  allowed)  such  that  the  total  cost  of 
the  chosen  objects  is  at  most  c  and  their  total  value  is  at  least  u) 
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e.  LONGEST-PATH  =  {<G,  //.  w,  k>'.  G  is  an  unweighted,  undirected  graph, 
u,  and  v  are  vertices  in  G.  k  s  0.  and  there  exists  a  path  with  no  repeated 
edges  from  u  to  v  whose  length  is  at  least  k } 

T.  BOUNDED-PCP  =  { <P.  k>  :  P  is  an  instance  of  the  Post  Correspondence 
problem  that  has  a  solution  of  length  less  than  or  equal  to  k } 

11.  Let  USAT  =  {<m»  :  itiis  a  wff  in  Boolean  logic  and  w  has  exactly  one  satisfying 
assignment}.  Does  the  following  nondeterministic,  polynomial-time  algorithm 
decide  USAT?  Explain  your  answer. 

decideUSAT(<w>)  = 

1.  Nondeterministically  select  an  assignment  x  of  values  to  the  variables  in  w. 

2.  If  x  does  not  satisfy  w.  reject. 

3.  Else  nondeterministically  select  another  assignment  y  *  x. 

4.  If  y  satisfies  w\  reject. 

5.  Else  accept. 

12.  Ordered  binary  decision  diagrams  (OBDDs)  are  useful  in  manipulating  Boolean 
formulas  such  as  the  ones  in  the  language  SAT.  They  are  described  in  B.l.3.  Con¬ 
sider  the  Boolean  function  f\  shown  there.  Using  the  variable  ordering 
(jr3<j|<A'2),  build  a  decision  tree  for  /.  Show  the  (reduced)  OBDD  that 
createOBDDfromtree  will  create  for  that  tree. 

13.  Complete  the  proof  of  Theorem  28.18  by  showing  how  to  modify  the  proof  of 
Theorem  28.16  so  that  R  constructs  a  formula  in  conjunctive  normal  form. 

14.  Show  that,  if  P  =  NP,  then  there  exists  a  deterministic,  polynomial-time  algo¬ 
rithm  that  finds  a  satisfying  assignment  for  a  Boolean  formula  if  one  exists. 

15.  Let  R  be  the  reduction  from  3-SAT  to  VERTEX-COVER  that  we  defined  in  the 
proof  of  Theorem  28.20.  Show  the  graph  that  R  builds  when  given  the  Boolean 
formula,  (~>P  V  Q  v  T)  A  (-. P  V  Q  V  5)  A  (T  V  v  S). 

16.  We’ll  say  that  an  assignment  of  truth  values  to  variables  almost  satisfies  a  con¬ 
junctive  normal  form  (CNF)  Boolean  wff  with  k  clauses  iff  it  satisfies  at  least 
k  -  1  clauses.  A  CNF  Boolean  wff  is  almost  satisfiahle  iff  some  assignment 
almost  satisfies  it.  Show  that  the  following  language  is  NP-complete. 

•  ALMOST-SAT  =  { <w>  :  w  is  an  almost  satisfiahle  CNF  Boolean  formula} 

17.  Show  that  VERTEX-COVER  is  NP-complete  by  reduction  from  INDEPENDENT- 
SET. 

18.  In  Appendix  O.  we  describe  the  regular  expression  sublanguage  in  Perl.  Show 
that  regular  expression  matching  in  Perl  (with  variables  allowed)  is  NP-hard. 

19.  In  most  route-planning  problems,  the  goal  is  to  find  the  shortest  route  that  meets 
some  set  of  conditions.  But  consider  the  following  problem  (aptly  named  the 
taxicab  rip-off  problem  in  [Lewis  and  Papadimitriou  1W8|):  Given  a  directed 
graph  G  with  positive  costs  attached  to  each  edge,  find  the  longest  path  from  ver¬ 
tex  /  to  vertex  j  that  visits  no  vertex  more  than  once. 

a.  Convert  this  optimization  problem  to  a  language  recognition  problem. 

b.  Make  the  strongest  statement  you  can  about  the  complexity  of  the  resulting 

language. 
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20.  In  Section  22.3,  we  introduced  a  family  of  tiling  problems  and  defined  the  lan¬ 
guage  TILES.  In  that  discussion,  we  considered  the  question, “Given  a  finite  set  T 
of  tile  designs  and  an  infinite  number  of  copies  of  each  such  design,  is  it  possible 
to  tile  every  finite  surface  in  the  plane?”  As  we  saw,  this  unbounded  version  of 
the  problem  is  undecidable.  Now  suppose  that  we  are  again  given  a  set  T  of  tile 
designs.  But,  this  time,  we  are  also  given  n2  specific  tiles  drawn  from  that  set.  The 
question  we  now  wish  to  answer  is, “Given  a  particular  stack  of  n2  tiles,  is  it  possi¬ 
ble  to  tile  an  /i  x  n  surface  in  the  plane?”  As  before,  the  rules  are  that  tiles  may 
not  be  rotated  or  flipped  and  the  abutting  regions  of  every  pair  of  adjacent  tiles 
must  be  the  same  color.  So,  for  example,  suppose  that  the  tile  set  is: 


Then  a  2  X  2  grid  can  be  tiled  as: 


3 

X 

a.  Formulate  this  problem  as  a  language,  FINITE-TILES. 

b.  Show  that  FINITE-TILES  is  in  NP. 

c.  Show  that  FINITE-TILES  is  NP-complete  (by  showing  that  it  is  NP-hard). 

21.  In  Section  28.7.6,  we  defined  what  we  mean  by  a  map  coloring. 

a.  Prove  the  claim,  made  there,  that  a  map  is  2-coIorable  iff  it  does  not  contain 
any  point  that  is  the  junction  of  an  odd  number  of  regions.  (Hint:  Use  the  pi¬ 
geonhole  principle.) 

b.  Prove  that  3-COLORABLE  =  {<m>  :m  is  a  3-colorable  map}  is  in  NP. 

c.  Prove  that  3-COLORABLE  =  {<m>  :  m  is  a  3-colorable  map}  is  NP-com- 
plcte. 


22.  Define  the  following  language. 

•  BIN-OVERSTUFFED  =  {<S,  c,  k>  :  S  is  a  set  of  objects  each  of  which  has 
an  associated  size  and  it  is  not  possible  to  divide  the  objects  so  that  they  fit 
into  k  bins,  each  of  which  has  size  c} 

Explain  why  it  is  generally  believed  that  BIN-OVERSTUFFED  is  not  NP-complete. 


23.  Let  G  be  an  undirected,  weighted  graph  with  vertices  V,  edges  £,  and  a  function 
cost(e)  that  assigns  a  positive  cost  to  each  edge  e  in  £.  A  cut  of  G  is  a  subset  S  of 
the  vertices  in  V.  The  cut  divides  the  vertices  in  V  into  two  subsets,  5  and  V  -  S. 
Define  the  size  of  a  cut  to  be  the  sum  of  the  costs  of  all  edges  (u,  v )  such  that  one 
of «  or  i;  is  in  S  and  the  other  is  not.  We’ll  say  that  a  cut  is  nontrivial  iff  it  is  nei¬ 
ther  0  nor  V.  Recall  that  we  saw.  in  Section  28.7.4,  that  finding  shortest  paths  is 

,l  can  be  done  ,n  Pdynomial  time),  but  that  finding  longest  paths  is  not. 
We  II  observe  a  similar  phenomenon  with  respect  to  cuts. 
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a.  Sometimes  we  want  to  find  the  smallest  cut  in  a  graph.  For  example,  it  is  pos¬ 
sible  to  prove  that  the  maximum  flow  between  two  nodes  s  and  /  is  equal  to 
the  weight  of  the  smallest  cut  that  includes  s  but  not  i.  Show  that  the  follow¬ 
ing  language  is  in  P. 

•  MIN-CUT  =  {<G.  k> :  there  exists  a  nontrivial  cut  of  G  with  size  at 
most  k } 

b.  Sometimes  we  want  to  find  the  largest  cut  in  a  graph.  Show  that  the  following 
language  is  NP-complete. 

•  MAX-CUT  =  {<G.  k>  :  there  exists  a  cut  of  G  with  size  at  least  k) 

c.  Sometimes,  when  we  restrict  the  form  of  a  problem  we  wish  to  consider,  the 
problem  becomes  easier.  So  we  might  restrict  the  maximum-cut  problem  to 
graphs  where  all  edge  costs  are  1 .  It  turns  out  that,  in  this  case,  the  “simpler” 
problem  remains  NP-complete.  Show  that  the  following  language  is  NP- 
complete. 

•  SIMPLE-MAX-CUT  =  {<G.  k>  :  all  edge  costs  in  G  are  1  and  there 
exists  a  cut  of  G  with  size  at  least  k } 

d.  Define  a  bisection  of  a  graph  G  to  be  a  cut  where  S  contains  exactly  half  of 
the  vertices  in  V.  Show  that  the  following  language  is  NP-complete.  (Hint: 
The  graph  G  does  not  have  to  be  connected.) 

•  MAX-BISECTION  =  {<G,  k>  :  G  has  a  bisection  of  size  at  least  k } 

24.  Show  that  each  of  the  following  functions  is  time-constructible. 

a.  n  log  n 

b.  nVn 

c.  n  * 

d.  2" 

e.  n\ 

25.  In  the  proof  of  Theorem  28.27  (the  Deterministic  Time  Hierarchy  Theorem),  we 
had  to  construct  a  string  w  of  the  form  <A/,(,<)lwr>10p.  Let  n  be  l<M,(„k<W)(> 
10/>|.  One  of  the  constraints  on  our  choice  of  p  was  that  it  be  long  enough  that 
I<A*,(Mj«,v>I<  log(/(«)/log  t(n)).  Let  m  be  |<M,(ttka,v>|.Then  we  claimed  that 
the  condition  would  be  satisfied  if  p  is  at  least  2r.  Prove  this  claim. 

26.  Prove  or  disprove  each  of  the  following  claims. 

a.  If  A  fl  and  fie  P.  then  A  e  P. 

b.  If  A  fi  and  fl  and  C  are  in  NP,  then  A  U  C  e  NP. 

c.  Let  ndtime(J\n))  be  the  set  of  languages  that  can  be  decided  by  some  nonde- 
terminislic Turing  machine  in  time  0(f(n)).  Every  language  in  niUime(2n)  is 
decidable. 

d.  Define  a  language  to  be  co-finite  iff  its  complement  is  finite.  Any  co-finite 
language  is  in  NP. 

e.  Given  an  alphabet  2.  let  A  and  fi  be  nonempty  proper  subsets  of  2*.  If  both 
A  and  fi  are  in  NP  then  A  fl. 

f.  Define  the  language  MANY-CLAUSE-SAT  =  {<ir>  :  w  is  a  Boolean  wff 
in  conjunctive  normal  form,  w  has  m  variables  and  k  clauses,  and  k  ^  2'n)  If 
P  *  NP,  MANY-CLAUSE-SAT  e  P. 
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In  the  last  chapter,  we  analyzed  problems  with  respect  to  the  time  required  to  decide 
them.  In  this  chapter,  we'll  focus  instead  on  space  requirements. 

29.1  Analyzing  Space  Complexity 

Our  analysis  of  space  complexity  begins  with  the  function  spacereq(M)  as  described  in 
Section  27.4.2: 

•  If  M  is  a  deterministic  Hiring  machine  that  halts  on  all  inputs,  then  the  value  of 
spacereq(M)  is  the  function  f(n)  defined  so  that,  for  any  natural  number  n,f(n)  is 
the  maximum  number  of  tape  squares  that  M  reads  on  any  input  of  length  n. 

•  If  M  is  a  nondeterministic  Turing  machine  all  of  whose  computational  paths  halt  on 
all  inputs,  then  the  value  of  spacereq(M)  is  the  function /(n)  defined  so  that,  for  any 
natural  number  n,f(n)  is  the  maximum  number  of  tape  squares  that  M  reads  on 
any  path  that  it  executes  on  any  input  of  length  n. 

So,  just  as  timereq(M)  measures  the  worst-case  time  requirement  of  M  as  a  function 
of  the  length  of  its  input, spacereq(M)  measures  the  worst-case  space  requirement  of  M 
as  a  function  of  the  length  of  its  input. 

29.1.1  Examples 

To  begin  our  discussion  of  space  complexity,  we’ll  return  to  three  of  the  languages  that 
we  examined  in  the  last  chapter:  CONNECTED,  SAT,  and TSP-DECIDE. 

EXAMPLE  29.1  Connected 

We  begin  by  showing  that  CONNECTED  =  {<G>  :G  is  an  undirected  graph 
and  G  is  connected}  can  be  decided  by  a  deterministic  Hiring  machine  that  uses 
linear  space.  Recall  that  a  graph  is  connected  iff  there  exists  a  path  from  each  ver¬ 
tex  to  each  other  vertex. 
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EXAMPLE  29.1  ( Continued ) 

Theorem  28.4  tells  us  that  CONNECTED  is  in  P.The  proof  exploited  an  algo¬ 
rithm  that  we  called  connected.  Connected  works  by  starting  at  G’s  first  vertex  and 
following  edges,  marking  vertices  as  they  are  visited.  If  every  vertex  is  eventually 
marked,  then  G  is  connected:  otherwise  it  isn’t.  In  addition  to  representing  G, 
connected  uses  space  for: 

•  storing  the  marks  on  the  vertices:  This  can  he  done  by  adding  one  extra  bit  to 
the  representation  of  each  vertex. 

•  maintaining  the  list  L  of  vertices  that  have  been  marked  but  whose  successors 
have  not  yet  been  examined:  We  didn’t  describe  how  L  is  maintained.  One 
easy  way  to  do  it  is  to  add  to  the  representation  of  each  vertex  one  extra  bit, 
which  will  be  1  if  that  vertex  is  a  member  of  L  and  0  otherwise. 

•  the  number  murked-vertices-counter.  Since  the  value  of  the  counter  cannot  exceed 
the  number  of  vertices  of  G,  it  can  be  stored  in  binary  in  log(  |  <G>  I )  bits. 

So  spocereq(connecied)  is  (0(i<G’>|). 


CONNECTED  is  an  “easy"  language  both  from  the  perspective  of  lime  and  the  per¬ 
spective  of  space  since  it  can  be  decided  in  polynomial  time  and  polynomial  (in  fact  lin¬ 
ear)  space.  Next  we  consider  a  language  that  appears  to  be  harder  if  we  measure  time 
but  is  still  easy  if  we  measure  only  space. 


EXAMPLE  29.2  SAT 

Consider  SAT  =  { <w>  :  w  is  a  wff  in  Boolean  logic  and  tv  is  satisfiable }.  SAT  is 
in  NP.so  it  can  be  decided  in  polynomial  time  by  a  nondeterministic Turing  machine 
that,  given  a  wff  w.  guesses  at  an  assignment  of  values  to  its  variables.  Then  it 
checks  whether  that  assignment  makes  w  7Vnt\The  checking  procedure  (outlined 
in  the  proof  of  Theorem  28.12),  requires  no  space  beyond  the  space  required  to 
encode  w.  It  can  overwrite  the  variables  of  w  with  their  assigned  values.  Then  it 
can  evaluate  subexpressions  and  replace  each  one  with  the  value  T  or  F.  So  SAT 
can  be  decided  by  a  nondeterministic  Turing  machine  that  uses  linear  space. 

SAT  is  believed  not  to  be  in  P.  No  deterministic,  polynomial-rime  algorithm  is 
known  for  it.  But  it  can  be  decided  by  a  deterministic,  polynomial-.v/xur  algorithm 
that  works  as  follows: 

decideS  AT  del  ermi  nisticall  y(<w> )  ~ 

1.  Lexicographically  enumerate  the  rows  of  the  truth  table  for  w.  For  each  row  do: 

1.1.  Evaluate  w  (by  replacing  the  variables  with  their  values  and  apply¬ 
ing  the  operators  to  those  values,  as  described  above). 

1.2.  If  tv  evaluates  to  True,  accept. 

2.  If  no  row  of  the  truth  table  caused  w  to  be  True,  reject. 
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Each  slop  of  ihis  procedure  requires  only  linear  space.  But  what  about  the 
space  that  may  he  required  to  control  the  loop?  When  we  analyze  algorithms  to 
determine  their  space  requirements,  we  must  be  careful  to  include  whatever  space 
is  used  by  a  loop  index  or,  more  significantly,  a  stack  if  one  is  used.  For  example, 
consider  a  recursive  implementation  of  decides ATdetemunistically  that,  at  each 
invocation,  evaluates  w  if  all  variables  have  had  values  assigned.  Otherwise,  it 
picks  an  unassigned  variable,  assigns  it  a  value,  and  then  recurs.  This  implementa¬ 
tion  could  require  a  stack  whose  depth  equals  the  number  of  variables  in  w.  Each 
slack  entry  would  need  a  copy  of  w.  Since  the  number  of  variables  can  grow  lin¬ 
early  with  w.  we’d  have  that  spacereq(decideSATdeterministically)  is  0(|v>l"). 
That's  polynomial,  but  not  linear. 

Fortunately,  in  the  case  of  decideSATdeterministically,  it  is  possible  to  control 
the  loop  using  only  an  amount  of  space  that  is  linear  in  the  number  of  variables  in 
w.  Lei  0  correspond  to  False  and  1  correspond  to  True.  Assign  an  order  to  the  n 
variables  in  w.Then  each  row  of  w's  truth  table  is  a  binary  string  of  length  n.  Begin 
by  generating  the  siring  0".  At  each  step,  use  binary  addition  to  increment  the 
string  by  1.  Halt  once  the  assignment  that  corresponds  to  1"  has  been  evaluated. 

Using  this  technique,  we  have  that  spacereq(decideSATdeterministically)  is 
0(M).  So  SAT  can  also  be  decided  by  a  deterministic  Turing  machine  that  uses 
linear  space. 


EXAMPLE  29.3  TSP-DECIDE 

Consider  TSP-DECIDE  =  {<G,  cost>  :  <G>  encodes  an  undirected  graph 
with  a  positive  distance  attached  to  each  of  its  edges  and  G  contains  a  Hamiltonian 
circuit  whose  total  cost  is  less  than  cost}.  We  showed,  in  Theorem  28. 10,  that  TSP- 
DECIDE  is  in  NP.To  prove  the  theorem,  we  described  a  nondeterministic  Turing 
machine,  TSPdecide,  that  decides  the  language  TSP-DECIDE  by  nondelerminis- 
tically  attempting  to  construct  a  circuit  one  step  at  a  time,  checking,  at  each  step, 
to  see  that  ihe  circuit's  total  cost  is  less  than  cost.  TSPdecide  uses  space  to  store 
the  partial  circuit  and  its  cost. The  length  of  any  Hamiltonian  circuit  can’t  be  longer 
than  the  list  of  edges  in  G,  since  no  edge  may  appear  twice.  So  the  space  required 
to  store  a  partial  circuit  is  a  linear  function  of  |<G>|.  The  machine  halts  if  the 
cost  so  far  exceeds  cost,  so  the  space  required  to  store  the  cost  so  far  is  bounded 
by  cost. Thus  we  have  that  spacereq(TSPdecide)  is  0(|<G>|), 

But  TSPdecide  is  nondeterministic.  How  much  space  would  be  required  by  a 
deterministic  machine  that  decides  TSP-DECIDE?  Wc  can  define  such  a  machine 
as  follows: 

decideTSPdeterministically(<G ,  cust>)  = 

1.  Set  circuit  to  contain  just  vertex  1. 

2.  \i  explore(G,  0.  circuit)  returns  True  then  accept,  else  reject. 

Ilie  bulk  of  the  work  is  then  done  by  the  recursive  procedure  explore,  which 
lakes  a  partial  circuit  as  input.  It  uses  depth-first  search  to  see  whether  it  is  possible 
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EXAMPLE  29.3  ( Continued ) 

to  extend  that  circuit  into  one  that  is  complete  and  whose  cost  is  less  than  cost.  Each 
call  to  explore  extends  the  circuit  by  one  edge.  Explore  is  defined  as  follows: 

explore(<G,cost,circuit>)  = 

1.  If  circuit  is  complete  and  its  cost  is  less  than  cost .  return  True. 

2.  If  circuit  is  complete  and  its  cost  is  not  less  than  cost ,  return  False. 

3.  If  circuit  is  not  complete  then  do:  /*  Try  to  extend  it. 

4.  For  each  edge  e  that  is  incident  on  the  last  vertex  of  circuit,  or  until  a  return 
statement  is  executed,  do: 

4.1.  If  the  other  vertex  of  e  is  not  already  part  of  circuit  or  if  it  would 
complete  circuit  then: 

Call  explore(<G ,  cost  +  cost  of  e,  circuit  with  e  added>). 

If  the  value  returned  is  True  then  return  True. 

5.  No  alternative  returned  True.  So  return  False. 

DecideTSPdeterministically  works  by  recursively  invoking  explore.  It  needs 
space  to  store  the  stack  that  holds  the  individual  invocations  of  explore ,  including 
their  arguments.  Some  paths  may  end  without  considering  a  complete  circuit,  but 
the  maximum  depth  of  the  stack  is  |  V\  +  1,  since  that  is  the  number  of  vertices  in 
any  complete  circuit.  Each  stack  record  needs  space  to  store  a  cost  and  a  complete 
circuit,  whose  length  is  |V|  +  1. 

So  we  have  that  spacereq{TSPdecidedeterministically)  is  0(|<G>2|).  We  can 
actually  do  better  and  decide  TSP-DEC1DE  using  only  linear  space  by  storing,  at 
each  invocation  of  explore,  just  a  cost  and  the  one  new  vertex  that  is  added  at  that 
step.  Thus,  while  we  know  of  no  deterministic  Turing  machine  that  can  decide 
TSP-DECIDE  in  polynomial  time,  there  does  exist  one  that  can  decide  it  in  poly¬ 
nomial  (in  fact,  linear)  space. 


29.1 .2  Relating  Time  and  Space  Complexity 

The  examples  that  we  have  just  considered  suggest  that  there  is  some  relationship 
between  the  number  of  steps  a  Turing  machine  executes  and  the  amount  of  space  it 
uses. The  most  fundamental  relationship  between  the  two  numbers  arises  from  the  fact 
that,  at  each  step  of  its  operation,  a  Turing  machine  can  examine  at  most  one  tape 
square.  So  we  have,  for  any  Turing  machine  M ,  that  spacer  eq(M )  <  limereq(M). 

But  M’s  time  requirement  cannot  be  arbitrarily  larger  than  its  space  requirement.  We 
are  considering  only  Turing  machines  that  halt  on  all  inputs.  If  a  Turing  machine  M  halts, 
then  it  can  never  re-enter  a  configuration  that  it  has  been  in  before.  (If  it  did,  it  would  be 
in  an  infinite  loop.)  So  the  number  of  steps  that  M  can  execute  is  bounded  by  the  num¬ 
ber  of  distinct  configurations  that  it  can  enter.  We  can  compute  the  maximum  number 
of  such  configurations  as  follows:  Let  K  be  the  states  of  M  and  let  F  be  its  tape  alphabet. 
M  may  be  in  any  one  of  its  |/C|  states.  Define  M’s  active  tape  to  be  the  smallest  tape 
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fragment  that  contains  all  the  nonblank  symbols  plus  the  read/write  head.  Assuming 
that  spacereq(M)  a  n  (i.e.,  that  M  actually  reads  all  its  input),  the  number  of  squares  in 
M's  active  tape  at  any  point  during  Af  s  computation  is  bounded  by  spacereq(M).  Each 
of  those  squares  may  hold  any  one  of  the  |T|  tape  symbols.  So  the  maximum  number  of 
distinct  tape  snapshots  is  |T| And  M's  read/write  head  may  be  on  any  one  of 
the  spacereq(M)  tape  squares.  So  the  maximum  number  of  distinct  configurations  that 
M  can  enter  is: 


MaxConfigs(M)  =  |/C|  ■  \Y\Ipacrreq(M)  •  spacer  eq(M). 

Let  c  be  a  constant  such  that  c  >  |T|.  Then: 

MaxConfigs(M)  e  0(cipace,rel{M)). 

(We  leave  the  proof  of  this  claim  as  an  exercise.).  Using  the  analysis  we  have  just 
presented,  we  can  prove  the  following  theorem: 

THEOREM  29.1  Relating  Time  and  Space  Requirements 

Theorem:  Given  a  TUring  machine  M  =  (K,  2,  I\  5,  s,  H)  and  assuming  that 
spacereq(M)  s  n ,  the  following  relationships  hold  between  M's  time  and  space 
requirements: 

spacereq(M)  ^  timereq(M)  e  0(cspai'rreqiM^). 

Proof:  Spacereq(M)  is  bounded  by  timereq(M)  since  M  must  use  at  least  one  time 
step  for  every  tape  square  it  visits. 

The  upper  bound  on  timereq(M)  follows  from  the  fact,  since  M  halts,  the  number 
of  steps  that  it  can  execute  is  bounded  by  the  number  of  distinct  configurations  that 
it  can  enter.  That  number  is  given  by  the  function  MaxConfigs(M).  as  described 
above.  Since  MaxConfigs(M)  e  0(cspacereq{M)),  timereq(M)  e  0(c*pacerei »<M‘). 

In  a  nutshell,  space  can  be  reused.Time  cannot. 


.2  PSPACE,  NPSPACE,  and  Savitch's  Theorem 

If  our  measure  of  complexity  is  time,  it  appears  that  nondeterminism  adds  power.  So, 
for  example,  there  are  languages,  such  as  SAT  and  TSP-DECIDE,  that  are  in  NP  but 
that  do  not  appear  to  be  in  P.  When  we  change  perspectives  and  measure  complexity  in 
terms  of  space  requirements,  the  distinction  between  nondeterministic  and  determinis¬ 
tic  machines  turns  out  almost  to  disappear. 

Recall  that  we  defined  the  language  class  P  to  include  exactly  those  languages  that 
could  be  decided  by  a  deterministic  Turing  machine  in  polynomial  time.  And  we  defined 
the  class  NP  to  include  exactly  those  languages  that  could  be  decided  by  a 
nondeterministic  Tiring  machine  in  polynomial  time.  We'll  now  define  parallel  classes 
based  on  space  requirements. 
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The  Class  PSPACE:  L  e  PSPACE  iff  there  exists  some  deterministic  Turing  machine 

M  that  decides  L  and  spuccreep  M )  e  0(nK )  for  some  constant  k. 

The  Class  N PS  PACE:  L  e  NPSPACE  iff  there  exists  some  nondeterministic  Tur¬ 
ing  machine  M  that  decides  L  and  spacereip M )  e  CT"4 )  for  some  constant  k. 

Savitch’s  Theorem,  which  we'll  stale  and  prove  next,  tells  us  that  we  needn’t  have 
bothered  to  define  the  two  classes.  PSPACE  =  NPSPACE. 

Note  that,  since  every  deterministic  Turing  machine  is  also  a  legal  nondeterministic 
one.  if  a  language  L  can  lie  decided  by  some  deterministic  l  uring  machine  that  requires 
/(/»)  space,  then  it  can  also  be  decided  by  some  nondeterministic  luring  machine  that 
requires  at  most  /(»)  space. The  other  direction  does  not  follow.  It  may  he  possible  to 
decide  L  with  a  nondeterministic  Turing  machine  that  uses  just  f(n)  space  but  there 
may  exist  no  deterministic  machine  that  can  do  it  without  using  more  than  (9(/(/i)) 
space.  However,  it  turns  out  that  // s  deterministic  space  complexity  cannot  be  much 
worse  than  its  nondeterministic  space  complexity.  We  are  about  to  prove  that,  assum¬ 
ing  one  common  condition  is  satisfied,  there  must  exist  a  deterministic  TUring  machine 
that  decides  it  using  0[f(n)2)  space. 

The  proof  that  we  will  do  is  by  construction.  We'll  show  how  to  transform  a  nonde¬ 
terministic  Turing  machine  into  an  equivalent  deterministic  one  that  conducts  a  sys¬ 
tematic  search  through  the  set  of  “guesses"  that  the  nondeterministic  machine  could 
have  made.  That's  exactly  what  we  did  I'orTSP-DECIDE  above.  In  that  case,  we  were 
able  to  construct  a  deterministic  Turing  machine  that  conducted  a  straightforward 
depth-first  search  and  that  required  only  O(ir)  space  to  store  its  stack.  But  we  exploited 
a  specific  property  of  TSP-DEC1DE  to  make  that  work:  We  knew  that  any  Hamiltonian 
circuit  through  a  graph  with  |V|  vertices  must  have  exactly  iKl  edges.  So  the  depth  of 
the  stack  was  bounded  by  |V'|  and  thus  by  |G’|. 

In  general,  while  there  is  a  bound  on  the  depth  of  the  stack,  it  is  much  weaker.  We 
can  guarantee  only  that,  if  a  nondeterministic  Turing  machine  M  uses  spacereq(M) 
space,  then  any  one  branch  of  a  depth-first  deterministic  Turing  machine  that  simulates 
M  must  hall  in  no  more  than  MuxConfigs(M)  steps  (since  otherwise  it  is  in  a  loop).  But 
M  axConfigs{M )  e  ').  We  can’t  afford  a  slack  that  could  grow  that  deep. 

There  is.  however,  an  alternative  to  depth-first  search  that  can  be  guaranteed  to  re¬ 
quire  a  stack  whose  depth  is  <9(/»).  We'll  use  it  in  the  proof  of  SavitchYTheorem,  which 
we  state  next. 

THEOREM  29.2  Savitch's  Theorem 

Theorem:  If  L  can  be  decided  by  some  nondeterministic  Turing  machine  M  and 
spacereq(M)  ^  n,  then  there  exists  a  deterministic  Turing  machine  M’  that  also 
decides  L  and  spuceretp  M')e  0{spacereq{  M ):). 

Proof:  We  require  that  spue  creep  M )  >  n.  which  means  just  that  M  must  at  least  be 
able  to  read  all  its  input.  In  Section  29.4.  we  ll  introduce  a  way  to  talk  about 
machines  that  use  less  than  linear  workinx  space.  Once  we  do  that,  this  constraint 
can  be  weakened  to  spacereq(M)  z  log  n. 

The  proof  is  by  construction.  Suppose  that  L  is  decided  by  some  nondeterminis- 
tic  Turing  machine  M.  We  will  show  an  algorithm  that  builds  a  deterministic  Turing 
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machine  M'  that  also  decides  L  and  thal  uses  space  that  grows  no  faster  than  the 
square  of  the  space  required  by  M.  M‘  will  systematically  examine  the  paths  that  M 
could  pursue.  Since  our  goal  is  to  put  a  bound  on  the  amount  of  space  that  M'  uses, 
the  key  to  its  construction  is  a  technique  for  guaranteeing  that  the  stack  that  it  uses 
to  manage  its  search  doesn’t  get  "too  deep.”  We’ll  use  a  divide-and-conquer  tech¬ 
nique  in  which  we  chop  each  problem  into  two  halves  in  such  a  way  that  we  can 
solve  the  first  half  and  then  reuse  the  same  space  to  solve  the  second  half. 

To  simplify  the  question  that  we  must  answer,  we’ll  begin  by  changing  M  so 
that,  whenever  it  is  about  to  accept,  it  first  blanks  out  its  tape.  Then  it  enters  a 
new,  unique,  accepting  state.  Call  this  new  machine  Mhiank-  Note  that  M />/,,„* 
accepts  iff  it  ever  reaches  the  configuration  in  which  its  tape  is  blank  and  it  is 
in  the  new  accepting  state.  Call  this  configuration  catvepl.  M^ank  uses  no  addi¬ 
tional  space,  so  spacereq^M^,^)  =  spacereq{M). 

Now  we  must  describe  the  construction  of  M'.  which,  on  input  u>,  must  accept 
iff  Mi, i„„k.  on  input  w,  can  (via  at  least  one  of  its  computational  paths)  reach 
Because  we  need  to  bound  the  depth  of  the  stack  that  M'  uses,  we  need  to 
bound  the  number  of  steps  it  can  execute  (since  it  might  have  to  make  a  choice  at 
each  step).  We  have  already  seen  that  simple  approaches,  such  as  depth-first 
search,  cannot  do  that  adequately.  So  we’ll  make  use  of  the  following  function, 
canreach.  Its  job  is  to  answer  the  more  general  question, "Given  a  Turing  machine 
T  running  on  input  w ,  two  configurations,  c,  and  c2,  and  a  number  /,  could  T ,  if  it 
managed  to  gel  to  c,,  go  on  and  reach  c2  within  t  steps?”  Canreach  works  by  ex¬ 
ploiting  the  following  observation:  If  T  can  go  from  c\  to  c2  within  t  steps,  then 
one  of  the  following  must  be  true: 

1.  t  =  0.  In  this  case,  c,  =  c2. 

2.  t  =  1 .  In  this  case,  et  \-r  c2.  (Recall  that  |-7-  is  the  yields-in-one-step  relation 
between  configurations  of  machine  T .)  Whether  the  single  required  step  ex¬ 
ists  can  be  determined  just  by  examining  the  transitions  of  T. 

3.  I  >  1.  In  this  case,  ci  I-7-. ..  |-re*  |-r-..c2.  In  other  words,  there  is  some  (at 
least  one)  configuration  c*  that  T  goes  through  on  the  way  from  t-i  to  c2. 
Furthermore,  note  that,  however  many  configurations  there  are  on  the  path 
from  C|  to  c2,  there  is  a  "middle”  one,  i.e.,  one  with  the  property  that  half  of 
7”s  work  is  done  getting  from  c,  to  it  and  the  other  half  is  done  getting  from 
it  to  c2.  (It  won  t  matter  that,  if  the  length  of  the  computation  is  not  an  even 
number,  there  may  be  one  more  configuration  on  one  side  of  the  middle 
one  than  there  is  on  the  other.) 

So  canreach  operates  as  follows:  If  1  =  0,  all  it  needs  to  do  is  to  check  whether 
t’l  =  t’2-  If 1  ~  jus*  checks  whether  the  one  required  transition  exists  in  T.  If 
l  >  1 .  then  it  considers,  as  a  possible  "middle"  configuration,  all  configurations  of  T 
that  use  no  more  space  than  spacereq(T)  allows  for  inputs  of  length  |w?|.  It  will  re¬ 
cursively  invoke  itself  and  ask  whether  T  could  both  go  from  c,  to  middle  in  ill  steps 
and  front  middle  to  c2  in  the  remaining  til  steps.  (Since  ill  may  not  be  an  integer, 
we  II  give  each  invocation  f  j/2  ]  steps,  where  [  til }  is  the  ceiling  of  til.  i.e.,  the 
smallest  integer  that  is  greater  than  or  equal  to  t/2.)  For  this  approach  to  work,  we 
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must  be  able  to  guarantee  that  there  is  only  a  finite  number  of  configurations  that 
T could  enter  while  processing  to.  We  are  only  going  to  invoke  canreach  on  decid¬ 
ing  Turing  machines,  so  we  know  not  only  that  the  number  of  such  configurations 
is  finite  but  that  it  is  bounded  by  MaxConfigs(T),  which  is  a  function  of  the  num¬ 
ber  of  states  in  T.  the  size  of  its  tape  alphabet,  and  spacereq(  T).  which  bounds  the 
number  of  tape  squares  that  can  be  nonblank  as  a  function  of  the  length  of  to. 

Canreach  will  take  five  arguments:  a  Turing  machine  7,  an  input  string  to,  a 
pair  of  configurations.  ct  and  c2.  and  a  nonnegative  integer  that  corresponds  to 
the  number  of  lime  steps  that  T  may  use  in  attempting  to  gel  from  Cj  to  c2.  Note 
that  T  and  tv  won't  change  as  canreach  recursively  invokes  itself.  Also  note  that 
the  only  role  w  plays  is  that  its  length  determines  the  number  of  tape  squares  that 
can  be  used.  Canreach  can  be  defined  as  follows: 

canreach(T: Turing  machine,  to:  string,  ot:  configuration,  c2:  configuration, 

t :  nonnegative  integer)  = 

1.  If  C|  =  c2  then  return  True. 

2.  If  t  =  1  then: 

2.1.  If  C)  It*  c2  then  return  True.  I*  C|  yields  c2  in  one  step. 

2.2.  Else  return  False.  I*  In  one  stcp.tj  cannot  yield  c2. 

3.  If  /  >  1,  then  let  Confs  be  the  set  of  all  of  T' s  configurations  whose  tape 
is  no  longer  than  spacereq(T)  applied  to  |to|.  For  each  configuration 
ntidtlle  in  Confs  do: 

3.1.  If  canreach(T ,  to,  C|.  middle,  lift])  and  canreach{T .  to,  middle,  c2, 
f  t/2 } )  then  return  True. 

4.  Return  False.  I*  None  of  the  possible  middles  worked. 

We  can  now  return  to  our  original  problem;  Given  a  nondetcrminislic  Turing  ma¬ 
chine  M.  construct  a  deterministic  Turing  machine  M'  such  that  L(M')  =  L(M)  and 
spacereq(  M' )  e  0{spacereq{  M  )2).  The  following  algorithm  solves  the  problem: 

huilddet(M:  nondcterministic  Turing  machine)  = 

1.  From  M,  build  Mhiunk  as  described  above. 

2.  From  build  M'.To  make  it  easy  to  describe  AT,  define: 

•  c.Mun  to  be  the  start  configuration  or  on  input  to. 

•  mux-on-tv  to  be  the  result  of  applying  the  function  maxConfigs 

hi nnk)  to  |w|.  (So  max-on-tv  is  the  maximum  number  of  distinct 
configurations  that  Afw„„*  might  enter  when  started  on  lO.Thus  it  is 
also  the  maximum  number  of  steps  that  MWiw4  might  execute  on 
input  to, given  that  it  eventually  halts.) 

3.  Then  M'(tv)  operates  as  follows: 

If  canreach{MNtmk.  to,  csU„„  cMiepl,  max-on-tv)  then  accept,  else  reject 

Canreach  will  return  True  iff  MMailk  (and  thus  M)  accepts  to.  So  L(M')  =  L(M). 

But  it  remains  to  show  that  spacereqi  M' )  e  0(spucereq(  M  )2). 
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Each  invocation  of  canreach  requires  storing  an  activation  record.  It  suffices  to 
store  Af  and  w  once.  But  each  record  must  contain  a  new  copy  of  two  configurations 
and  the  integer  that  puts  a  bound  on  the  number  of  steps  to  be  executed.  Each  con¬ 
figuration  requires  0(spacereq{M))  space,  so  each  invocation  record  also  requires 
0(spacereq(M))  space.  Now  all  we  need  to  do  is  to  determine  the  depth  of  the 
stack  of  invocation  records.  Notice  that  each  invocation  of  canreach  cuts  the  allotted 
number  of  steps  in  half.  So  the  depth  of  the  stack  is  bounded  by  log 2(max-°n-w). 
But  max-on-iv  is  maxConfigs(M)  applied  to  \w\  and  we  have  that: 

MaxConf  igs(M)  e  0(cspacereq(M)). 
log2 (MaxConfigs(M))  e  0{s pacer eq(M)). 

So  the  depth  of  the  stack  is  0(spacereq(M))  and  the  total  space  required  is 
0(s  pacer eq(  M  )2). 


Savitch’s  Theorem  has  an  important  corollary,  which  we  state  next. 

THEOREM  29.3  PSPACE  =  NPSPACE 
Theorem:  PSPACE  =  NPSPACE. 

Proof:  In  one  direction,  the  claim  is  trivial:  If  L  is  in  PSPACE,  then  it  must  also  be  in 
NPSPACE  because  the  deterministic  Turing  machine  that  decides  it  in  polynomial 
time  is  also  a  nondeterministic  Turing  machine  that  decides  it  in  polynomial  time. 

To  prove  the  other  direction,  wc  note  that  Savitch’s  Theorem  tells  us  that  the 
price  for  going  from  a  nondeterministic  machine  to  a  deterministic  one  is  at  most 
a  squaring  of  the  amount  of  space  required.  More  precisely,  if  L  is  in  NPSPACE 
then  there  is  some  nondeterministic  Turing  machine  Af  such  that  Af  decides  L 
and  spacereq(M)eO{nk )  for  some  k.\f  k  s  1,  then,  by  Savitch’s  Theorem,  there 
exists  a  deterministic  Turing  machine  Af'  such  that  Af'  decides  L  and 
spacereq(M')eO(n2k).  If,  on  the  other  hand,  k  <  1  then,  using  the  same  con¬ 
struction  that  we  used  in  the  proof  of  Savitch’s  Theorem,  we  can  show  that  there 
exists  a  deterministic  Af'  such  that  Af'  decides  L  and  spacereq(M') eO(r?).  In 
either  case,  spacereq{M')  is  a  polynomial  function  of  n.  So  L  can  be  decided  by 
a  deterministic  Turing  machine  whose  space  requirement  is  some  polynomial 
function  of  the  length  of  its  input.  Thus  L  is  in  PSPACE. 


Another  corollary  of  Savitch’s  Theorem  follows. 


THEOREM  29.4  P  C  NP  C  PSPACE 


Theorem:  PCNPC  PSPACE. 

Proof:  Theorem  28.14  tells  us  that  P  C  NP.  It  remains  to  show  that  NP  C  PSPACE. 
If  a  language  L  is  in  NP,  then  it  is  decided  by  some  nondeterministic  Turing 
machine  Af  in  polynomial  time.  In  polynomial  time,  Af  cannot  use  more  than 
polynomial  space  since  it  lakes  a  least  one  time  step  to  visit  a  tape  square. 
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Since  M  is  a  nondeterministic  Turing  machine  that  decides  L  in  polynomial 
space,  L  is  in  NPSPACE.  Bui,  by  Savitch’s  Theorem.  PSPACE  =  NPSPACE. 
So  L  is  also  in  PSPACE. 


It  is  assumed  that  both  subset  relationships  are  proper  (i.c..  that  P  *  NP  *  PSPACE), 
but  no  proof  of  either  of  those  claims  exists. 

29.3  PSPACE-Completeness 

Recall  that,  in  our  discussion  of  time  complexity,  we  introduced  two  useful  language 
families:  We  said  that  a  language  is  NP-hard  iff  every  language  in  NP  is  deterministic, 
polynomial-time  reducible  to  it.  And  we  said  that  a  language  is  NP-complete  iff  it  is 
NP-hard  and  it  is  also  in  NP.  All  NP-complcte  languages  are  equivalently  hard  in  the 
sense  that  all  of  them  can  be  decided  in  nondeterministic,  polynomial  time  and,  if  any 
one  of  them  can  also  be  decided  in  deterministic  polynomial  lime,  then  all  of  them  can. 

In  our  attempt  to  understand  why  some  problems  appear  harder  than  others,  it  is 
useful  to  define  corresponding  classes  based  on  space  complexity.  So  we  consider  the 
following  two  properties  that  a  language  L  might  possess. 

1.  L  is  in  PSPACE. 

2.  Every  language  in  PSPACE  is  deterministic,  polynomial-time  reducible  to  L. 

Using  those  properties,  we  will  define: 

The  Class  PSPACE-hard:  L  is  PSPACE-hard  iff  it  possesses  properly  2. 

The  Class  PSPACE-complete:  L  is  PS  PACE-complete  iff  it  possesses  both 

property  1  and  property  2.  All  PSPACE-complete  languages  can  be  viewed  as 

being  equivalently  hard  in  the  sense  that  all  of  them  can  he  decided  in  polynomial 

space  and: 

•  If  any  PSPACE-complete  language  is  also  in  NP.  then  all  of  them  are 
and  NP  =  PSPACE. 

•  If  any  PSPACE-complete  language  is  also  in  P,  then  all  of  them  arc  and 
P  =  NP  =  PSPACE. 

Note  that  we  have  defined  PSPACE-hardness.just  as  we  defined  NP-hardness,with 
respect  to  polynomial-//we  reducibility.  We  could  have  defined  it  in  terms  of  the  space 
complexity  of  the  reductions  that  we  use.  But  the  polynomial-time  definition  is  more 
useful  because  it  provides  a  stronger  notion  of  a  “computationally  feasible"  reduction. 
If  all  wc  knew  about  two  languages  Lx  and  L2  were  that  Lx  were  polynomial-space 
reducible  to  L2.  an  efficient  (i.e.,  polynomial-time)  solution  to  L2  would  not  guarantee 
an  efficient  solution  to  L\.  The  efficiency  of  the  solution  for  1.2  might  be  swamped  by  a 
very  inefficient  reduction  from  Lx  to  L2.  By  continuing  to  restrict  our  attention  to  de¬ 
terministic,  polynomial-time  reductions,  we  guarantee  that  if  Lx  is  reducible  to  L2  and  an 
efficient  solution  to  L2  were  to  be  found,  we  would  also  have  an  efficient  solution  for  L\. 

When  we  began  our  discussion  of  NP-completeness.  wc  faced  a  serious  problem  at 
the  outset:  How  could  we  find  our  first  NP-complete  language?  Once  we  had  that  one. 
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we  could  prove  that  other  languages  were  also  NP-complete  by  reduction  from  it.  We 
face  the  same  problem  now  as  we  begin  to  explore  the  class  of  PSPACE-complete  lan¬ 
guages.  We  need  a  first  one. 

Recall  that,  in  the  case  of  NP-completeness,  the  language  that  got  us  going  and  that 
provided  the  basis  for  the  proof  of  the  Cook-Levin  Theorem,  was  SAT  (the  language  of 
satisfiable  Boolean  formulas).  The  choice  of  SAT  was  not  arbitrary. To  prove  that  it  was 
NP-complete,  we  exploited  the  expressive  power  of  Boolean  logic  to  describe  computa¬ 
tional  paths.  Since  every  NP  language  is,  by  definition,  decided  by  some  nondeterminis- 
tic  Turing  machine  each  of  whose  paths  halts  in  a  finite  (and  polynomially-bounded) 
number  of  steps,  we  were  able  to  define  a  reduction  from  an  arbitrary  NP  language  L  to 
the  specific  NP  language  SAT  by  showing  a  way  to  build,  given  a  deciding  machine  M 
for  L  and  a  string  w,  a  Boolean  formula  whose  length  is  bounded  by  a  polynomial  func¬ 
tion  of  the  length  of  w  and  that  is  satisfiable  iff  M  accepts  w. 

Perhaps  we  can,  similarly,  seed  the  class  of  PSPACE-complete  languages  with  a  logi¬ 
cal  language.  Because  we  believe  that  PSPACE  includes  languages  that  are  not  in  NP, 
we  wouldn’t  expect  SAT  to  work.  On  the  other  hand,  we  can’t  jump  all  the  way  to  a  first- 
order  logic  language  like  FOL,heorcm  =  {<A,  w>  :  A  is  a  decidable  set  of  axioms  in 
first-order  logic,  w  is  a  sentence  in  first-order  logic,  and  w  is  entailed  by  A),  since  it  isn’t 
decidable  at  all,  much  less  is  it  decidable  in  polynomial  space.  In  the  next  section,  we  will 
define  a  new  language,  QBF,  that  adds  quantifiers  to  the  language  of  Boolean  logic  but 
that  stops  short  of  the  full  power  of  first-order  logic.  Then  we  will  show  that  QBF  is 
PSPACE-complete.  We’ll  do  that  using  a  construction  that  is  similar  to  the  one  that  was 
used  to  prove  the  Cook-Levin  Theorem.  We’ll  discover  one  wrinkle,  however:  In  order 
to  guarantee  that,  on  input  w ,  the  length  of  the  formula  that  we  build  is  bounded  by 
some  polynomial  function  of  w ,  we  will  need  to  use  the  divide-and-conquer  technique 
that  we  exploited  in  the  proof  of  Savitch’s  Theorem. 


.3.1  The  Language  QBF 

Boolean  formulas  are  evaluated  with  respect  to  the  universe  {T rue,  False}.  A  particular 
Boolean  well-formed  formula  (wff),  such  as  {{P  AQ)  v  -i/?)  -»  S,  is  a  function,  stated  in 
terms  of  some  finite  number  of  Boolean  variables.  Given  a  particular  set  of  values  as  its 
input,  it  returns  either  True  or  False.  We  have  defined  Boolean-formula  languages,  like 
SAT,  in  terms  of  properties  that  the  formulas  that  are  in  the  language  must  possess.  So, 
for  example: 

•  A  wff  w  e  SAT  iff  it  is  satisfiable.  In  other  words,  w  e  SAT  iff  there  exists  some  set  of 
values  for  the  variables  of  w  such  that  w  evaluates  to  True. 

•  A  wff  w  e  VALID  iff  it  is  a  tautology.  In  other  words,  w  e  VALID  iff  for  all  values 
for  the  variables  of  w ,  w  evaluates  to  True. 

So,  while  Boolean  formulas  do  not  contain  quantifiers,  we  have  used  quantification 
in  our  descriptions  of  Boolean-formula  languages.  Now  suppose  that  we  add  explicit 
quantifiers  to  the  logical  language  itself. 

Define  the  language  of  quantified  Boolean  expressions  as  follows: 

.  The  base  case:  All  wffs  are  quantified  Boolean  expressions. 
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•  Adding  quantifiers:  If  w  is  a  quantified  Boolean  expression  that  contains  the  un¬ 
bound  variable  A.  then  the  expressions  3A  ( iv)  and  VA  (w)  are  quantified  Boolean 
expressions.  Exactly  as  we  do  in  first-order  logic,  we'll  then  say  that  A  is  bound  in  w 
and  that  the  scope  of  the  new  quantifier  is  w. 

All  of  the  following  are  quantified  Boolean  expressions. 

•  (P  A  ~'R)  — * S 

•  3P  ((/*  A  ->R)—*S) 

•  V/?  (3/*  ((/*  A  ->/?)  — *  S)) 

•  VS  (VR  (3P  ((P  A  ->/?)  — ►S))). 

Notice  that,  because  of  the  way  they  are  constructed,  all  quantified  Boolean  expres¬ 
sions  are  in  prenex  normal  form,  as  defined  in  B.2.1.  In  other  words,  the  expression  is 
composed  of  a  quantifier  list  followed  by  a  quantifier-free  matrix.  We’ll  find  this  form 
useful  below. 

As  in  first-order  logic,  well  say  that  a  quantified  Boolean  expression  is  a  sentence  iff 
all  of  its  variables  are  bound.  So,  for  example  VS  (V/?  {3P  ((P  A  -’/?)  —*  S)))  is  a  sen¬ 
tence,  but  none  of  the  other  expression  listed  above  is. 

A  quantified  Boolean  formula  is  a  quantified  Boolean  expression  that  is  also  a  sen¬ 
tence.  Every  quantified  Boolean  formula,  just  like  every  sentence  in  first-order  logic, 
can  be  evaluated  to  produce  cither  True  or  False.  For  example: 

•  3P  (3R  {P  A  ->/?))  evaluates  to  True. 

•  3 P  (W?  (P  A  -«/?))  evaluates  to  False. 

We  can  now  define  the  language  that  will  turn  out  to  he  our  first  PS  PACE-complete 
language: 

•  QBF  =  { <w>  :  w  is  a  true  quantified  Boolean  formula}. 


29.3.2  QBF  is  PSPACE-Complete 

QBF,  unlike  languages  like  FOL,hcorL.m  that  are  defined  with  respect  to  full  first-order 
logic,  is  decidable. The  reason  is  that  the  universe  with  respect  to  which  existential  and 
universal  quantification  are  defined  is  finite.  In  general,  we  cannot  determine  the  valid¬ 
ity  of  an  arbitrary  first-order  logic  formula,  such  as  V.r  ( P(.x )).  by  actually  evaluating  P 
for  all  possible  values  of  jr.The  domain  of  x  might,  for  example,  be  the  integers.  But  it  is 
possible  to  decide  whether  an  arbitrary  quantified  Boolean  formula  is  true  by  exhaus¬ 
tively  examining  its  (finite)  truth  table. 

We’ll  show  next  that  not  only  is  it  possible  to  decide  QBF,  it  is  possible  to  decide  it 
in  polynomial  (in  fact  linear)  space. 

THEOREM  29.5  QBF  is  in  PSPACE 

Theorem:  QBF  =  {<w>  :  w  is  a  true  quantified  Boolean  formula}  is  in  PSPACE. 

Proof:  We  show  that  QBF  is  in  PSPACE  by  exhibiting  a  deterministic,  polynomial- 
space  algorithm  that  decides  it.  The  algorithm  that  we  present  exploits  the  fact 
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that  quantified  Boolean  formulas  are  in  prenex  normal  form.  So,  if  to  is  a 
quantified  Boolean  formula,  it  must  have  the  following  form,  where  each  Q  is  a 
quantifier  (either  V  or  3)  and /is  a  Boolean  wff: 

Qvi(Qv2{Qvi...(Qvn  (/)•.)• 

The  following  procedure,  QBFdecide ,  decides  whether  tv  is  true.  It  peels  off 
the  quantifiers  one  at  a  time,  left-to-right.  Each  time  it  peels  off  a  quantifier  that 
binds  some  variable  v,  it  substitutes  True  for  every  instance  of  v  and  calls  itself  re¬ 
cursively.  Then  it  substitutes  False  for  every  instance  of  v  and  calls  itself  recur¬ 
sively  again.  At  some  point,  it  will  be  called  with  a  Boolean  wff  that  contains  only 
constants.  When  that  happens,  the  wff  can  simply  be  evaluated. 

QBFdecide(<w> )  = 

1.  Invoke  QBFcheck(<w>). 

2.  If  it  returns  True,  accept;  else  reject. 

QBFcheck(<w>)  = 

1.  If  to  contains  no  quantifiers,  evaluate  it  by  applying  its  Boolean  operators 
to  its  constant  values.  The  result  will  be  either  True  or  False.  Return  it. 

2.  If  to  is  Vu  (to'),  where  to'  is  some  quantified  Boolean  formula,  then: 

2.1.  Substitute  True  for  every  occurrence  of  v  in  to'  and  invoke 
QBFcheck  on  the  result. 

2.2.  Substitute  False  for  every  occurrence  of  v  in  to'  and  invoke 
QBFcheck  on  the  result. 

2.3.  If  both  of  these  branches  accept,  then  to'  is  true  for  all  values  of  v. 
So  accept;  else  reject. 

3.  If  io  is  3v  (to’),  where  to'  is  some  quantified  Boolean  formula,  then: 

3.1.  Substitute  True  for  every  occurrence  of  v  in  to'  and  invoke 
QBFcheck  on  the  result. 

3.2.  Substitute  False  for  every  occurrence  of  v  in  to'  and  invoke 
QBFcheck  on  the  result. 

3.3.  If  at  least  one  of  these  branches  accepts,  then  to'  is  true  for  some 
value  of  v.  So  accept;  else  reject. 

We  analyze  the  space  requirement  of  QBFdecide  as  follows:  The  depth  of 
QBFcheck  s  stack  is  equal  to  the  number  of  variables  in  to,  which  is  O(|to|).  At 
each  recursive  call,  the  only  new  information  is  the  value  of  one  new  variable. 
So  the  amount  of  space  for  each  stack  entry  is  constant.  The  actual  evaluation 
of  a  variable-free  wff  to  can  be  done  in  0(|to|)  space. Thus  the  total  space  used 

by  QBFdecide  is  O(|to|).  QBFdecide  runs  in  linear  (and  thus  obviously  polyno¬ 
mial)  space. 

We  can’t  prove  that  a  more  efficient  algorithm  for  deciding  QBF  doesn’t  exist.  But  the 
result  that  we  will  prove  next  strongly  suggest  that  none  does.  In  particular,  it  tells  us  that 
a  nondeterministic,  polynomial-time  algorithm  exists  only  if  NP  =  PSP  ACE  and  a  de¬ 
terministic,  polynomial-time  algorithm  exists  only  if  p  =  jqp  =  PSP  ACE. 
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THEOREM  29.6  QBF  is  PSPACE-Complete  _ 

Theorem:  QBF  =  { <?/;>  :  w  is  a  true  quantified  Boolean  formula}  is  PSPACE- 
complete. 

Proof:  We  have  already  shown  that  QBF  is  in  PSPACE.  So  all  that  remains  is  to 
show  that  it  is  PSPACE-hard.  We'll  do  that  by  showing  a  polynomial-time 
reduction  to  it  from  any  language  in  PSPACE.  We'll  use  approximately  the  same 
technique  that  we  used,  in  the  proof  of  the  Cook-Levin  Theorem,  where  we 
showed  that  SAT  is  NP-hard.  Let  L  be  any  language  in  PSPACE.  L  is  decided  by 
some  deterministic  Turing  machine  M  with  the  property  that  spocereq(M)  is  a 
polynomial.  We'll  describe  a  reduction  from  L  to  QBF  that  works  by  constructing 
a  quantified  Boolean  formula  that  describes  the  computation  of  M  on  input  w 
and  that  is  true  iff  M  accepts  ir. 

Just  as  we  did  in  the  proof  of  the  Cook-Levin  Theorem,  we'll  use  Boolean  vari¬ 
ables  to  describe  each  of  the  configurations  that  M  enters  while  processing  w. 
Our  first  idea  might  be  simply  to  construct  a  Boolean  formula  exactly  as  we  de¬ 
scribed  in  the  Cook-Levin  Theorem  proof.  Then  we  can  convert  that  formula  into 
a  quantified  Boolean  formula  by  binding  each  of  its  variables  by  an  existential 
quantifier.  The  resulting  quantified  Boolean  formula  will  be  true  iff  the  original 
formula  is  satisfiablc. 

It  remains  to  analyze  the  time  complexity  of  the  construction. The  number  of 
steps  required  by  the  construction  is  a  polynomial  function  of  the  length  of  the 
formula  that  it  constructs.  The  length  of  the  formula  is  polynomial  in  the  number 
of  cells  in  the  table  that  describes  the  computation  of  M  on  »«\  Each  row  of  the 
table  corresponds  to  one  configuration  of  M.  so  the  number  or  cells  in  a  row  is 
0(spacercq\M)).  which  is  polynomial  in  |ir|. 

But  now  we  have  a  problem.  In  the  proof  of  the  Cook-Levin  Theorem,  we 
knew  that  the  maximum  length  of  any  computational  path  of  M  was  0(|  m»|*).  So 
the  maximum  number  of  configurations  that  would  have  to  be  described,  and 
thus  the  number  of  rows  in  the  table,  was  also  |ir|* ).  I  "he  problem  that  we  now 
face  is  that  we  no  longer  have  a  polynomial  bound  on  the  number  of  configura¬ 
tions  that  M  may  enter  before  it  halts.  All  we  have  is  a  polynomial  bound  on  the 
amount  of  space  M  uses.  Using  that  space  bound,  we  can  construct  a  time  bound, 
as  we’ve  done  before,  by  taking  advantage  of  the  fact  that  M  may  not  enter  a 
loop.  So  the  maximum  number  of  steps  it  may  execute  is  bounded  by  the  maxi¬ 
mum  number  of  distinct  configurations  it  may  enter.  That  number  is 

MaxCnn f igs{ M )  =  \K\  •  | p| '/’•*•'"•'/<  .  spmvrvqW)  e 

So,  if  we  used  exactly  the  same  technique  we  used  in  the  proof  of  the  Cook- 
Levin  Theorem,  we'd  be  forced  to  describe  the  compulation  of  M  on  w  with  a  for¬ 
mula  whose  length  grows  exponentially  with  |u  |.  A  polynomial-time  reduction 
cannot  build  an  exponentially  long  formula. 

The  solution  to  this  problem  is  to  exploit  quantifiers  lo  “cluster”  subexpres¬ 
sions  so  that  a  whole  group  of  them  can  be  described  at  once. 
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To  do  this,  we'll  begin  by  returning  to  the  divide-and-conquer  technique  that  we 
used  in  our  proof  of  Savitch’s Theorem.  As  we  did  there,  we  will  again  solve  a  more 
general  problem.  This  time,  we  will  describe  a  technique  for  constructing  a  quanti¬ 
fied  Boolean  formula  /(c},  c2, 0  that  is  true  iff  M  can  get  from  configuration  q  to 
configuration  c2  in  at  most  t  steps.  We  again  observe  that,  if  M  can  get  from  config¬ 
uration  q  to  configuration  c2  within  t  steps,  then  one  of  the  following  must  be  true: 

1.  t  =  0.  In  this  case,q  =  c2.  Since  each  configuration  can  be  described  by  a  for¬ 
mula  whose  length  is  polynomial  in  |w|,  this  condition  can  also  be  described 
by  such  a  formula. 

2.  r  =  l.  In  this  case.q  yields  c2  in  a  single  step.  Using  the  techniques  we  used 
to  build  Conjunct 4  in  the  proof  of  the  Cook-Levin  Theorem,  this  condition 
can  also  be  described  by  a  formula  whose  length  is  polynomial  in  |w|.  Note 
that,  in  the  proof  of  the  Cook-Levin  Theorem,  we  built  a  Boolean  formula 
and  then  asked  whether  it  was  satisfiable.  Now  we  build  the  same  Boolean 
formula  and  then  bind  all  the  variables  with  existential  quantifiers,  so  we 
again  ask  whether  any  values  satisfy  the  formula. 

3.  t  >  1.  In  this  case,  q  yields  c2  in  more  than  one  step.  Then  there  is  some 
configuration  we'll  call  middle  with  the  property  that  M  can  get  from  q  to 
middle  within  f  t!2  ]  steps  and  from  middle  to  c2  within  another  [  tl 2 1  steps. 
Of  course,  as  in  the  proof  of  Savitch’s  Theorem,  we  don’t  know  what  middle 
is.  But.  when  we  build  /(q,  q>,  /),  we  can  use  an  existential  quantifier  to  as¬ 
sert  that  it  exists.  The  resulting  formula  will  only  be  true  if,  in  fact,  middle 
does  exist. 

Now  we  just  need  a  space-efficient  way  to  represent  /(q,c2,  t)  in  the  third 
case.  Suppose  that  middle  exists. Then  some  set  of  Boolean  variables  m\,  m2,... 
describe  it.  So  we  could  begin  by  writing: 

f(c'\,c2J)  ~  (3#n2  •••(/(<*!,  middle,  \t/l])  A  f  (middle,  c2,  ff/2l))...). 

We  can  simplify  this  by  introducing  the  following  shorthand: 

•  If  c  is  a  configuration  that  is  described  by  the  variables  q,  c2, . . .,  let  3c  ( p ) 
stand  for  3c,  ( 3c2  (p)  ...)  and  let  Vc  ( p )  stand  for  Vc,  (Vc2  (p)  . . . ). 

Note  that  since  the  number  of  variables  required  to  describe  c  is  polynomial 
in  lw|,  the  length  of  the  expanded  formula  is  a  polynomial  function  of  the 
length  of  the  shorthand  one. 

This  lets  us  rewrite  our  definition  of /as: 

/(q,c2,0  =  ^middle  (f(ch  middle, \t/2})  A /(middle,  c2,\ til})). 

Then  we  could  recursively  expand /(q ,  middle , \tll})  and /(middle,  c2,  f  t/2 1 ), 
continuing  until  \  tl2\  becomes  0  or  1.  We  cut  the  number  of  computation  steps 
that  might  have  to  be  described  in  half  each  time  we  do  this  recursion.  But  we  also 
replace  a  single  formula  by  the  conjunction  of  two  formulas.  So  the  total  length  of 
the  formula  that  we'll  build,  if  we  take  this  approach,  is  0(r). 
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Unfortunately,  it  becomes  obvious  that  this  approach  isn't  efficient  enough  as 
soon  as  we  return  to  the  original,  specific  problem  of  describing  the  computation 
of  M  on  m.  As  we  did  in  the  proof  of  Savilch’s  Theorem,  we’ll  actually  work  with 
M hi, i, ib  a  modification  of  M  that  accepts  by  entering  a  unique  accepting  configu¬ 
ration  that  we'll  call  Let  i\llin  be  the  starting  configuration  of  M  on  u>.  We 

know  that  the  number  of  steps  that  M  may  execute  in  getting  from  eytun  to  c^p, 
is  0(c'pmrmi{M>)  and  that  spacereq(M)  is  polynomial  in  |ir|.  The  formula  that  we 
must  build  is  then: 

ric  r  2 

J  Mart •  tv/ir*  *  /• 

So  its  length  grows  exponentially  with  |ir|. 

To  reduce  its  size,  we'll  exploit  universal  quantifiers  to  enable  us  to  describe  the 
two  recursively  generated  subformulas  of  /(ct.  c:.  / )  as  a  single  formula.  To  do 
this,  we  need  to  create  a  new.  generic  formula  that  describes  the  transition,  within 
f  ill  1  steps,  from  an  arbitrary  configuration  we’ll  call  cy  to  another  arbitrary  con¬ 
figuration  we  ll  call  c4.  Tlic  names  don't  matter  as  long  as  we  describe  the  two  con¬ 
figurations  with  variables  that  are  distinct  from  all  the  variables  that  we  will  use  to 
describe  actual  configurations  of  M.  So  the  new  formula  is  f(cy.  r4,  f  //2 1 ).  Then 
we’ll  want  to  say  that  the  new  formula  must  be  true  both  when: 

•  Cj  =  t  |  and  c4  =  middle .  and 

•  Cy  =  middle  and  c4  =  c2. 

We'll  do  that  by  saying  that  it  must  be  true  for  all  (i.e..  both)  of  those  assign¬ 
ments  of  values  to  the  variables  of  Cy  and  c4.  To  do  that,  we  need  the  following  ad¬ 
ditional  shorthands. 

•  Let  V  (.v.  y)(p)  stand  lor  V.v  (Vv  (p)). 

•  Let  V.v  e  {.v.  /}(/;)  stand  for  V.v  ((.v  =  s  V  .r  =  ()  — *  p). 

•  Combining  these. let  V(.v.  v) e  {(.vj.  .v2).  (/,.  h)}(p)  say  that  ((.v  =  .v,  A  y  =  s2) 

-*  P)  A  (x  =  /|  A  y  =  li)-*  p). 

Note  that  the  length  of  the  expanded  versions  of  one  of  these  shorthands 
grows  at  most  polynomially  in  the  length  of  the  shortened  form.  With  these  con¬ 
ventions  in  hand,  we  can  now'  offer  a  new  way  to  begin  to  define  J\ 

/(f|.  c2, 0  =  3 middle  (V(cj.  c4)  e  { (C|.  middle).  ( middle ,  c2) } 

(/k,.c4.  f  r/2l ))). 

We’re  still  using  the  convention  that  a  configuration  name  stands  for  the  en¬ 
tire  collection  of  variables  that  describe  it.  So  this  formula  asserts  that  there  is 
some  configuration  middle  such  that  f(ty.  c4.  I  ill  ' )  is  true  both  when: 

•  the  variables  in  cy  take  on  the  values  of  the  variables  in  C|  and  the  variables  in 
c4  take  on  the  values  of  the  variables  in  middle,  and 

•  the  variables  in  Cy  take  on  the  values  of  the  variables  in  middle  and  the  vari¬ 
ables  in  q  take  on  the  values  of  the  variables  in  r2. 
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Now  we  must  recursively  define  f(cy,  c4.  \  tl2})  and  we  must  continue  the  re¬ 
cursive  definition  process  until  [  r/2 1  =  0  or  1. 

If  we  do  that,  how  long  is  f(csiar,,  cacccr„,  2k  >)?  The  answer  is  that  the 

number  of  recursive  steps  is  log2(2*'  ,,w  wv(M)),  which  is  0{spacereq(M)).  And 
now  the  length  of  the  subformula  that  is  added  at  each  step  is  also 
0(spticereq(M)).  So  the  total  length  of  the  formula,  and  thus  the  amount  of  time 
required  to  construct  it.  is  0(spacereq2(M)).  So  we  have  described  a  technique 
that  can  reduce  any  language  in  PSPACE  to  QBF  in  polynomial  time. 

.3.3  Other  PSPACE-Hard  and  PSPACE-Complete  Problems 

QBF  is  not  the  only  PSPACE-complete  language.  There  are  others,  and  many  of  them 
exploit  the  quantifier  structure  that  QBF  provides.  We  mention  here  some  significant 
problems  that  are  PSPACE-hard.  many  of  which  are  also  PSPACE-complete. 

Two-Person  Games 

A  quantified  Boolean  formula  may  exploit  the  quantifiers  V  and  3  in  any  order.  But  now 
consider  the  specific  case  in  which  they  alternate.  For  example,  we  might  write 
3  A  (VZJ  ( 3C  (W)  ( P )))).  where  P  is  a  Boolean  formula  over  the  variables  A,  B,  C,  and  D. 
This  alternation  naturally  describes  the  way  a  player  in  a  two-player  game  evaluates 
moves.  So,  for  example,  1  could  reason  lhat  a  current  game  configuration  is  a  guaranteed 
win  for  me  if  there  exists  some  move  that  I  can  make  and  then  be  guaranteed  a  win.  But 
then,  to  evaluate  what  will  happen  at  the  next  move,  I  must  consider  that  1  don’t  get  to 
choose  the  move.  My  opponent  does.  So  1  can  only  conclude  that  the  next  configuration  is 
a  win  for  me  if  all  of  the  possible  second  moves  lead  to  a  win.  At  the  next  level,  it  is  again 
my  turn  to  choose,  so  I’m  interested  in  the  existence  of  some  winning  move,  and  so  forth. 

The  theory  of  asymptotic  complexity  that  we  have  developed  doesn't  tell  us  any¬ 
thing  about  solving  a  single  problem  of  fixed  size.  So  it  can’t  be  applied  directly  to 
games  of  fixed  size.  But  it  can  be  applied  if  we  generalize  the  games  to  configurations 
of  arbitrary  size.  When  we  do  that,  we  discover  that  many  popular  games  are  PSPACE- 
hard.  Some  of  them  are  also  in  PSPACE,  and  so  are  PSPACE-complete.  Some  appear 
to  be  harder.  In  particular: 

•  If  the  length  of  a  game  (i.e..  the  number  of  moves  lhat  occur  before  the  game  is 
over)  is  bounded  by  some  polynomial  function  of  the  size  of  the  game,  then  the 
game  is  likely  to  be  PSPACE-complete. 

♦  If  the  length  of  the  game  may  grow  exponentially  with  the  size  of  the  game,  then 
the  game  is  likely  not  be  solvable  in  polynomial  space.  But  it  is  likely  to  be  solvable 
in  exponential  time  and  thus  to  be  EXPTIME-complete. 


Many  real  games  are  interesting  precisely  because  they  are  too  hard  to  be 
practically  solvable  by  brute  force  search.  We  briefly  discuss  a  few  of  them, 
including  Sudoku,  chess, and  Go  in  N.2. 
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Questions  about  Languages  and  Automata 

In  Parts  II.  Ill,  and  IV,  \vc  described  a  variety  of  decision  procedures  for  regular, context- 
free.  and  context-sensitive  languages.  During  most  of  those  discussions,  we  focused  sim¬ 
ply  on  decidability  and  ignored  issues  of  complexity.  We  can  now  observe  that  several 
quite  straightforward  questions,  while  decidable,  are  hard.  'Ilicse  include  the  following. 
After  each  claim,  is  a  source  for  more  information. 

Finite  state  machine  inequivalence:  We  showed,  in  Chapter  4.  that  it  is  decidable 
whether  two  FSMs  are  equivalent.  Now  define: 

•  NeqNDFSMs  =  {<M,.  M2>  :  M,  and  \t2  are  NDFSMs  and  L(A/,)  #  L(M2 )}. 

NeqNDFSMs  is  PSPACE-complele  [Garey  and  Johnson  1979]. 

Finite  state  machine  intersection:  We  showed,  in  Chapter  S,  that  the  regular  lan¬ 
guages  are  closed  under  intersection.  And  we  showed,  in  Chapter  9.  that  it  is  decidable 
whether  the  language  accepted  by  an  FSM  is  empty.  So  we  know  that  the  following  lan¬ 
guage  is  decidable: 

•  2FSMs-lNTERSECT  =  {<M\.  \U>  \  M\  and  A7-,  are  deterministic  FSMs  and 
UM^)OL(M2)  *  0). 

2FSMs-INTERSECT  is  in  P.  So  it  is  tractable.  But  now  consider  a  generalization  to 
an  arbitrary  number  of  FSMs: 

•  FSMs-INTERSECT  =  {<M|,  . M„>  :  A7,  through  M„  are  deterministic 

FSMs  and  there  exists  some  string  accepted  by  all  of  them}. 

FSMs-INTERSECT  is  PSPACE-complele  [Garey  and  Johnson  1979]. 

Regular  expression  inequivalence:  We  showed,  in  Chapter  6.  that  there  exists  an  al¬ 
gorithm  that  can  convert  any  regular  expression  into  an  equivalent  FSM.  So  any  ques¬ 
tion  that  is  decidable  for  FSMs  must  also  be  decidable  for  regular  expressions.  So  we 
know  that  the  following  language  is  decidable; 

•  NeqREGEX  =  [<Ei,E2>:E\  and  E2  are  regular  expressions  and  L(M\) 

*  L[M2)}. 

NeqREGEX  is  PSPACE-complete  [Garey  and  Johnson  1979]. 

Regular  expression  incompleteness:  We  showed,  in  Chapter  9.  that  it  is  decidable 
whether  a  regular  language  (described  either  as  an  FSM  or  as  a  regular  expression)  is 
equivalent  to  2*.  So  we  know  that  the  following  language  is  decidable: 

•  NOT-SIGMA-STAR  =  ( <£>  :  E  is  a  regular  expression  and  L(E)  * 

NOT-S1GMA-STAR  is  PSPACE-complele  [Sudkamp  2(K)b|. 

Regular  expression  with  squaring  incompleteness:  Define  the  language  of  regular 
expressions  with  squaring  to  be  exactly  the  same  as  the  language  of  regular  expressions 
with  the  addition  of  one  new  operator  defined  as  follows: 

•  If  a  is  a  regular  expression  with  squaring,  then  so  is  nr.  L(u:)  =  /.(«)/.(«). 

Notice  that  the  squaring  operator  does  not  introduce  any  descriptive  power  to  the  lan¬ 
guage  of  regular  expressions.  It  does,  however,  make  it  possible  to  write  shorter  equivalents 
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lor  some  regular  expressions.  In  particular,  consider  the  regular  expression  that  is  com¬ 
posed  of  2"  copies  of  a  concatenated  together  (for  some  value  of  n).  Its  length  is  0(2"). 
Using  squaring,  we  can  write  an  equivalent  regular  expression,  (  . . .  (((a)2)2 .  •  •  )2.  with  the 
squaring  operator  applied  n  times,  whose  length  is  O(n).  Since  the  complexity  of  any  prob¬ 
lem  that  requires  reasoning  about  regular  expressions  is  defined  in  terms  of  the  length  of 
the  expression,  this  exponential  compression  of  the  size  of  an  input  siring  can  be  expected 
to  make  a  difference.  And  it  does.  Define  the  language: 

•  NOT-SIGMA-STAR-SQUARING  =  { <£>  :  E  is  a  regular  expression  with  squar- 
ingandL(E)  *  S/:*}. 

While  NOT-SIGM  A-STAR  (for  standard  regular  expressions)  is  PSPACE-complete, 
NOT-SIGMA-STAR-SQUARING  is  provably  not  in  PSPACE  [Sudkamp  20061-  So  we 
know  that,  since  PC  PSPACE,  no  polynomial-time  algorithm  can  exist  for  NOT- 
SIGMA-STAR-SQUARING. 

The  membership  question  for  context-sensitive  languages:  In  Section  24.1,  we  de¬ 
scribed  two  techniques  for  answering  the  question,  given  a  context-sensitive  language 
L  and  a  siring  w,  is  w  e  LI  One  approach  simulated  the  compulation  of  a  linear  bound¬ 
ed  automaton:  the  other  simulated  the  generation  of  strings  hy  a  context-sensitive 
grammar.  Unfortunately,  neither  of  those  techniques  is  efficient  and  it  seems  unlikely 
that  better  ones  exist.  Define  the  language: 

•  CS-MEMBERSHIP  =  \<G,w>:weL(G)}. 

CS-MEMBERSHIP  is  PSPACE-complele  [Garey  and  Johnson  1979], 


29.4  Sublinear  Space  Complexity 

It  doesn't  make  much  sense  to  talk  about  algorithms  whose  time  complexity  is  a(n), 
i.e.,  algorithms  whose  time  complexity  is  less  than  linear.  Such  algorithms  do  not  have 
time  to  read  their  entire  input.  But  when  we  turn  our  attention  to  space  complexity,  it 
does  make  sense  to  consider  algorithms  that  use  a  sublinear  amount  of  working  space 
(in  addition  to  the  space  required  to  hold  the  original  input).  For  example,  consider  a 
program  P  that  is  fed  a  stream  of  input  events,  eventually  followed  by  an  <end>  sym¬ 
bol.  P  s  job  is  to  count  he  number  of  events  that  occur  before  the  <end>.  It  doesn't 
need  to  remember  the  input  stream.  So  the  only  working  memory  that  is  required  is  a 
single  counter,  which  can  be  represented  in  binary, Thus,  ignoring  the  space  required  by 
the  input  stream,  spacereq(P)  e  0(log  n). 

To  make  it  easy  to  talk  about  the  space  complexity  of  programs  like  P  and  the  prob¬ 
lems  that  they  solve,  we  will  make  the  following  modification  to  our  computational 
model.  We  will  consider  Turing  machines  with  two  tapes: 

•  a  read-only  input  tape,  and 
»  a  read-write  working  tape. 

While  the  input  tape  is  read-only,  it  is  not  identical  to  the  input  stream  of  a  fi¬ 
nite  stale  machine  or  of  the  simple  counting  example  that  we  just  described.  The 
machine  may  move  back  and  forth  on  the  input  tape,  thus  examining  it  any  number 
of  times. 
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Now  we  will  define  spacereq(M)  by  counting  only  the  number  of  visited  cells  of  the 
read-write  (working)  tape.  Notice  that,  if  spueereq{M)  is  at  least  linear,  then  this  meas¬ 
ure  is  equivalent  to  our  original  one  since  M's  input  can  be  copied  from  the  input  tape 
to  the  read-write  tape  in  0{n)  space. 

Using  this  new  notion  of  space  complexity,  we  can  define  two  new  and  important 
space  complexity  classes:  But  first  we  must  resolve  a  naming  conflict.  We  have  been 
using  the  variable  L  to  refer  to  an  arbitrary  language.  By  convention,  one  of  the  com¬ 
plexity  classes  that  we  are  about  to  define  is  named  L.To  avoid  confusion,  we’ll  use  the 
variable  Lit  for  languages  when  necessary.  Now  we  can  slate  the  following  definitions: 

The  Class  L:  Lite  L  iff  there  exists  some  deterministic  Turing  machine  M  that  decides 
Lit  and  spacereq(M)  e  0(log  n). 

The  Class  ML:  Ltt  e  NL  iff  there  exists  some  nondeterminislic Turing  machine  M 
that  decides  Ltt  and  spacereq(M)  e  0(log  /»). 

We  have  chosen  to  focus  on  C?(log  n)  because: 

•  Many  useful  problems  can  be  solved  in  C?(log  n)  space.  For  example: 

•  It  is  enough  to  remember  the  length  of  an  input. 

•  It  is  enough  to  remember  a  constant  number  of  pointers  into  the  input. 

•  It  is  enough  to  remember  a  logarithmic  number  of  Boolean  values. 

•  It  is  unaffected  by  some  reasonable  changes  in  the  way  inputs  are  encoded.  For  ex¬ 
ample,  it  continues  not  to  matter  what  base,  greater  than  I ,  is  used  for  representing 
numbers. 

•  Savitch's  Theorem  can  be  extended  to  cases  where  spacereq(M)  &  log  n. 


EXAMPLE  29.4  The  Balanced  Parentheses  Language  is  in  L 

Recall  the  balanced  parentheses  language  Bal  =  (we  {),(}* :  the  parentheses 
are  balanced}.  We  have  seen  that  Bal  is  not  regular  but  it  is  context-free.  It  is  also  in 
L.  It  can  be  decided  by  a  deterministic  Turing  machine  M  that  uses  its  working  tape 
to  store  a  count,  in  binary,  of  the  number  of  left  parentheses  that  have  not  yet  been 
matched.  M  will  make  one  pass  through  its  input.  Each  time  it  sees  a  left  parenthe¬ 
sis,  it  will  increment  the  count  by  one.  Each  time  it  sees  a  right  parenthesis,  it  will 
decrement  the  count  by  one  if  it  was  positive.  If.  on  the  other  hand,  the  count  was 
zero,  M  will  immediately  reject.  If,  when  M  reaches  the  end  of  the  input,  the  count  is 
zero,  it  will  accept.  Otherwise  it  will  reject.  The  amount  of  space  required  to  store 
the  counter  grows  as  <9(Iog  |w|).  so  spacereq(M)  e  0(log  n). 


EXAMPLE  29.5  USTCON:  Finding  Paths  in  Undirected  Graphs  is  also  in  L 

Let  USTCON  =  (<G,s,  t>  :G  is  an  undirected  graph  and  there  exists  an  undi¬ 
rected  path  in  G  from  s  to  f } .  In  our  discussion  of  finite  stale  machines,  we  exploited 
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an  algorithm  Lo  find  the  states  that  are  reachable  from  the  start  state.  We  used  the 
same  idea  in  analyzing  context-free  grammars  to  find  nonterminals  that  are  useless 
(because  they  aren’t  reachable  from  the  start  symbol).  The  obvious  way  to  solve 
USTCON  is  the  same  way  we  solved  those  earlier  problems:  We  start  at  s  and  mark 
the  vertices  that  are  connected  to  it  via  a  single  edge.  Then  we  take  each  marked  ver¬ 
tex  and  follow  edges  from  it.  We  halt  whenever  we  have  marked  the  destination  ver¬ 
tex  t  or  we  have  made  a  complete  pass  through  the  vertices  and  marked  no  new  ones. 
To  decide  USTCON  then,  we  accept  if  t  was  marked  and  reject  otherwise. 

The  simple  decision  procedure  that  we  just  described  shows  that  USTCON  is  in 
P.  But  it  fails  to  show  that  USTCON  can  be  decided  in  logarithmic  space  because  it 
requires  (to  store  the  marks)  one  bit  of  working  storage  for  each  vertex  in  G.  An 
alternative  approach  shows  that  USTCON  is  in  NL.  Define  a  nondeterministic 
Turing  machine  M  that  searches  for  a  path  from  s  to  r  but  only  remembers  the  most 
recent  vertex  on  the  path.  M  begins  by  counting  the  vertices  in  G  and  recording  the 
count  (in  binary)  on  its  working  tape.  Then  it  starts  at  s  and  looks  for  a  path.  At 
each  step,  it  nondeterministically  chooses  an  edge  from  the  most  recent  vertex  it 
has  visited  to  some  new  vertex.  It  stores  on  its  working  tape  the  index  (in  binary) 
of  the  new  vertex.  And  it  decrements  its  count  by  1 .  Tf  it  ever  selects  vertex  t,  it  halts 
and  accepts.  If,  on  the  other  hand,  its  count  reaches  0,  it  halts  and  rejects.  If  there  is 
a  path  from  s  to  t,  there  must  be  one  whose  length  is  no  more  than  the  total  num¬ 
ber  of  vertices  in  G.  So  M  will  find  it.  And  it  uses  only  logarithmic  space  since  both 
the  step  counter  and  the  vertex  index  can  be  stored  in  space  that  grows  logarithmi¬ 
cally  with  the  size  of  G. 

It  turns  out  that  USTCON  is  also  in  L.  In  other  words,  there  is  a  determinis¬ 
tic,  logarithmic-space  algorithm  that  decides  it.  That  algorithm  is  described  in 
[Reingold  20051. 


What,  if  anything,  can  we  say  about  the  relationship  between  L,  NL,  and  the  other 
complexity  classes  that  we  have  already  considered?  First,  we  note  that  trivially  (since 
every  deterministic  Turing  machine  is  also  a  nondeterministic  one  and  since 
log  neO(n)): 

L  C  NL  C  PSP  ACE. 

But  what  about  the  relationship  between  L  and  NL  in  the  other  direction?  We  know 
of  no  languages  that  are  in  NL  and  that  can  be  proven  not  to  be  in  L.  But  neither  can 
we  prove  that  L  =  NL.  The  L  =  NL  question  exists  in  an  epistemological  limbo  anal¬ 
ogous  to  the  P  =  NP  question.  In  both  cases,  it  is  widely  assumed,  but  unproven,  that 
the  answer  to  the  question  is  no. 

As  in  the  case  of  the  P  =  NP  question,  one  way  to  increase  our  understanding  of  the 
L  =  NL  question  is  to  define  a  class  of  languages  that  are  at  least  as  hard  as  every  lan¬ 
guage  in  NL.  As  before,  we  will  do  this  by  defining  a  technique  for  reducing  one  language 
to  another.  In  all  the  cases  we  have  considered  so  far,  we  have  used  polynomial -time  re¬ 
duction.  But,  as  we  will  see  below,  NL  c  P.  So  a  polynomial-time  reduction  could  domi¬ 
nate  a  logarithmic  space  computation.  To  be  informative,  we  need  to  define  a  weaker 
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notion  of  reduction. The  one  we  will  use  is  called  logarithmic-space  (or  simply  log-space) 
reduction.  We  will  say  that  a  language  L  |  is  log-space  reducible  to  L2  iff  there  is  a  deter¬ 
ministic  two-tape  Turing  machine  (as  described  above)  that  reduces  L\  to  L2  and  whose 
working  tape  uses  no  more  than  CM  log  n)  space.  Now  we  cun  define: 

The  Class  NL-hard:  A  language  I  A  is  Nl.-hard  iff  every  language  in  NL  is  log- 
space  reducible  it. 

The  Class  NL-complete:  A  language  Lit  is  NL-eomplele  iff  it  is  NL-hard  and  it  is 
in  NL. 

Analogously  to  the  case  of  NP-completeness.  if  we  could  find  a  single  NL-complete 
language  that  is  also  in  L.  we  would  know  that  L  =  NL.  So  far.  none  has  been  found. 
But  there  are  NL-complete  languages.  We  mention  one  next. 


EXAMPLE  29.6  5TCON:  Finding  Paths  in  Directed  Graphs 

Let  STCON  =  { <G,  s,  />  :  G  is  a  directed  graph  and  there  exists  a  directed  path 
in  G  from  s  to  r}.  Note  that  STCON  is  like  USTCON  except  that  it  asks  for  a  path 
in  a  directed  (rather  than  an  undirected)  graph.  STCON  is  in  NL  because  it  can  be 
decided  by  almost  the  same  nondeierministic.  log-space  Turing  machine  that  we 
described  as  a  way  to  decide  USTCON.  The  only  difference  is  that  now  we  must 
consider  the  direction  of  the  edges  that  we  follow. 

Unlike  in  the  case  of  USTCON.  we  know  of  no  algorithm  that  shows  that 
STCON  is  in  L.  Instead,  it  is  possible  to  prove  that  STCON  is  NL-complete. 


So  we  don't  know  whether  L  =  NL.  We  also  don't  know  the  exact  relationship 
among  L.  NL.  and  P.  But  it  is  straightforward  to  prove  the  following  result  about  the 
relationship  between  Land  P. 

THEOREM  29.7  LCP 
Theorem:  LCP. 

Proof:  Any  language  in  L  can  be  decided  by  a  deterministic  Turing  machine  M, 
where  spacere</{M)e  CM  log  n).  We  can  show  that  M  must  run  in  polynomial  time 
by  showing  a  bound  on  the  number  of  distinct  configurations  it  can  enter.  Since  it 
halts,  it  can  never  enter  the  same  configuration  a  second  time.  So  we  have  a 
bound  on  the  number  of  steps  it  can  execute.  Although  M  has  two  tapes,  the 
contents  of  the  first  one  remain  the  same  in  all  configurations.  So  the  number  of 
distinct  configurations  of  XI  on  input  of  length  n  is  the  product  of: 

•  the  number  of  possible  positions  for  the  read  head  on  the  input  tape.  This  is 
simply  n. 

•  the  number  of  different  values  for  the  working  tape.  Each  square  of  the  work¬ 
ing  tape  can  lake  on  any  element  ol  I'  (M's  tape  alphabet).  So  the  maximum 
number  of  different  values  of  the  working  tape  is  ||‘|'/*"«'-*/i 
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•  the  number  of  positions  of  the  working  tape’s  read/write  head. This  is  bounded 
by  its  length, spaccreq(M). 

•  the  number  of  stales  of  M.  Call  that  number  k. 

Then,  on  an  input  of  length  n,  the  maximum  number  of  distinct  configurations 
of  M  is: 

/?  •  •  spacereq(M)  •  k. 

Since  M  is  deciding  a  language  in  L,spacereq(M)  e  G(log  n).  The  number  k  is 
independent  of  it.  So  the  maximum  number  of  distinct  configurations  of  M  is 
0(n  *  |r|l,,B"*log/i)  or.  simplifying.  0(/tl+u’elrMogH).  Thus  limereq(M)  is  also 
0(/,,  +  ,,,pl,1*log/i)  and  thus  0(n2+,og '),  which  is  polynomial  in  n.  So  the  lan¬ 
guage  that  M  decides  is  in  P. 

It  is  also  possible  to  prove  the  following  theorem,  which  makes  the  stronger  claim 
that  NL CP. 


THEOREM  29.8  NLCP 

Theorem:  NLCP. 

Proof:  The  proof  relies  on  facts  about  STCON  =  {<G.  s,  t>:G  is  a  directed 
graph  and  there  exists  a  directed  path  in  G  from  s  to  r}.  STCON  is  in  P  because  it 
can  be  decided  by  the  polynomial-time  marking  algorithm  that  we  described  in 
our  discussion  of  USTCON  in  Example  29.5.  STCON  is  also  NL-complete,  which 
means  that  any  other  language  in  NL  can  be  reduced  to  it  in  deterministic 
logarithmic  space.  But  any  deterministic  log-space  Turing  machine  also  runs  in 
polynomial  lime  because  the  number  of  distinct  configurations  that  it  can  enter  is 
bounded  by  a  polynomial,  as  we  saw  above  in  the  proof  that  LCP.  So  any 
language  in  NL  can  be  decided  by  the  composition  of  two  deterministic, 
polynomial-time  Turing  machines  and  thus  is  in  P. 

We  can  summarize  what  we  know  as: 


LCNLCPCPSPACE. 


Just  as  we  have  done  for  the  other  complexity  classes  that  we  have  considered,  we  can 
define  classes  that  contain  the  complements  of  languages  in  L  and  in  NL.  So  we  have: 

The  Class  co-L :  Lti  e  co-L  iff  -,L#  e  L. 

The  Class  co-NL:  Lti  e  co-NL  iff  -,L#  e  NL. 


It  is  easy  to  show  that  the  class  L  is  closed  under  complement  and  thus  tha 
L  =  co-L:  II  a  language  L#  is  decided  by  a  deterministic  Turing  machine  Af.then  then 
exists  another  deterministic Thring  machine  M'  such  that  M'  decides  ->L#.  M'  issimpl; 
M  with  the  y  and  n  states  reversed.  S pacer eq{M')  =  spacereqlM).  So,  if  L#  is  in  th 
class  L.  so  is  Us  complement. 
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It  is  less  obvious,  but  true,  that  NL  =  co-NL.  The  proof  follows  from  the  more  gen¬ 
eral  claim  that  we  will  stale  as  Theorem  24.10  (the  Immcrman-S/.elcpcsenyi  Theorem) 
in  the  next  section.  There  we  will  see  that  all  nondelcrminislic  space  complexity  classes 
whose  space  requirement  is  at  least  log  n  are  closed  under  complement. 


29.5  The  Closure  of  Space  Complexity  Classes 
Under  Complement 

Recall  that  the  class  P  is  closed  under  complement.  On  the  other  hand,  it  is  believed 
that  NP  #  co-NP.  although  no  proof  of  that  exists.  When  we  switch  from  considering 
lime  complexity  to  considering  space  complexity,  the  situation  is  clearer.  Both  deter¬ 
ministic  and  nondeterminislic  space-complexity  classes  are  closed  under  complement. 
The  fact  that  deterministic  ones  are  is  obvious  (since  a  deciding  machine  for  ->L  is  sim¬ 
ply  the  deciding  machine  for  L  with  its  accepting  and  rejecting  states  reversed).  The 
fact  that  nondeterminislic  ones  are  is  not  obvious. 

To  make  it  easy  to  state  the  next  group  of  theorems,  we  will  define  the  following 
families  of  languages. 

•  dspace(fln))  =  the  set  of  languages  that  can  be  decided  by  some  deterministic 
Turing  machine  M .  where  spacereqi M)  eO(f(n)) 

•  ntl\pace(f(n))  =  the  set  of  languages  that  can  be  decided  by  some  nondelerminis- 
tic Turing  machine  M,  where  s  pacer eq{ M )  e  0(f(n)) 

•  co-d.\pace{f(n))  =  the  set  of  languages  whose  complements  can  be  decided  by 
some  deterministic  Turing  machine  M.  where  v pacereq(  M )  e  (D(f(n)) 

•  co-nil space(f  (it))  =  the  set  of  languages  whose  complements  can  be  decided  by 
some  nondeterminislic  Turing  machine  M,  where  xpacereq(M)e  0(J'(n)) 

THEOREM  29.9  Deterministic  Space-Complexity  Classes  are  Closed 
Under  Complement 

Theorem:  For  every  function  f(n),  <1  space  (fin))  =  co-tl  spaced  f(n)). 

Proof:  If  /.  is  a  language  that  is  decided  by  some  deterministic  Turing  machine  M, 
then  the  deterministic  Turing  machine  M'  that  is  identical  to  M  except  that  the 
halting  slates  y  and  n  are  reversed  decides  -T..  SpaccreipM')  -  spacereq(M). 
So.  if  L  e  ilspuce{f  (n)),  so  is  ~,L. 

THEOREM  29.10  The  Immerman-Szelepcsenyi  Theorem  and  the  Closure 
of  Nondeterministic  Space-Complexity  Classes  Under 
Complement 

Theorem:  For  every  function  f(n)  s  log  n,iulspuce(f(n))  =  co-mlspace(f(n)). 

Proof:  'Hie  proof  of  this  claim,  that  the  nondeterminislic  space  complexity  classes 
are  closed  under  complement,  was  given  independently  in  |lmmcrman  19881  and 
|S/.elepcsenyi  1988). 
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One  application  of  the  Immerman-Szelepcsenyi  Theorem  is  to  a  question  we  asked 
in  Section  24.1,’lAre  the  context-sensitive  languages  closed  under  complement?"  We 
are  now  in  a  position  to  sketch  a  proof  of  Theorem  24.11,  which  we  restate  here  as 
Theorem  29.1 1. 


THEOREM  29.11  Closure  of  the  Context-Sensitive  Languages 
Under  Complement 

Theorem:  The  context-sensitive  languages  are  closed  under  complement. 

Proof:  Recall  that  a  language  is  context-sensitive  iff  it  is  accepted  by  some  linear 
bounded  automaton  (LBA).  An  LBA  is  a  nondeterministic  Hiring  machine 
whose  space  is  bounded  by  the  length  of  its  input.  So  the  class  of  context-sensitive 
languages  is  exactly  ndspuce(n).  By  the  Immerman-Szelepcsenyi  Theorem, 
ndspace(n)  —  co-nd spacc(n).  So  the  complement  of  every  context-sensitive 
language  can  also  be  decided  by  a  nondeterministic  Turing  machine  that  uses 
linear  space  (i.e.,  an  LBA).  Thus  it  too  is  context-sensitive. 


29.6  Space  Hierarchy  Theorems 

We  saw,  in  Section  28.9.1,  that  giving  a  Turing  machine  more  time  increases  the  class  of 
languages  that  can  be  decided.  The  same  is  true  of  increases  in  space.  We  can  prove  a 
pair  of  space  hierarchy  theorems  that  are  similar  to  the  time  hierarchy  theorems  that 
we  have  already  described.  The  main  difference  is  that  the  space  hierarchy  theorems 
that  we  can  prove  are  stronger  than  the  corresponding  time  hierarchy  ones  because 
running  space-bounded  simulations  does  not  require  the  overhead  that  appears  to  be 
required  in  the  time-bounded  case. 

Before  we  can  deline  the  theorems,  we  must  define  the  class  of  space-requirement 
functions  to  which  they  will  apply.  So,  analogously  (but  not  identically)  to  the  way  we 
defined  time  constructability,  we  will  define  space-constructibility: 

A  function  s(n)  from  the  positive  integers  to  the  positive  integers  is  space- 
conslruciihle  iff: 

•  s(n)  ^  log /i.  and 

•  the  function  that  maps  the  unary  representation  of  n  (i.e.,  1")  to  the  binary  repre¬ 
sentation  of  s(n)  can  be  computed  in  0(s(n))  space. 

Most  useful  functions,  as  long  as  they  are  at  least  log  n ,  are  space -constructible. 
Whenever  we  say  that,  for  some  Turing  machine  M,  spacereq(M)  ea{n),  we  are 
using,  as  our  definition  of  a  Turing  machine,  the  two-tape  machine  that  we  described  in 
Section  29.4.  In  that  case,  we  will  take  spacereq(M)  to  be  the  size  of  M's  working  tape. 


THEOREM  29.12  Deterministic  Space  Hierarchy  Theorem 

Theorem:  For  any  space-constructible  function  s(n),  there  exists  a  language  Ls{n)liard 
that  is  decidable  in  0(s(n ))  space  but  that  is  not  decidable  in  rr(s(n))  space. 
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Proof:  The  proof  is  by  diagonalization  and  is  similar  to  the  proof  we  gave  for 
Theorem  28.27  (the  Deterministic  Time  Hierarchy  Theorem).  The  tighter  bound 
in  this  theorem  comes  from  the  fact  that  it  is  possible  to  describe  an  efficient 
space-bounded  simulator. The  details  of  the  proof,  and  in  particular,  the  design  of 
the  simulator,  are  left  as  an  exercise. 


Exercises 

1.  In  Section  29.1.2.  we  defined  MaxConfif>s{M )  to  be  \K\' 

s pace req( M ).  We  then  claimed  that,  if  c  is  a  constant  greater  than  |r|,  then 
MaxConfif>s( M )  e  ,M|).  Prove  this  claim  by  proving  the  following  more 

general  claim: 

Given:  /is  a  function  from  the  natural  numbers  to  the  positive  reals, 

/is  monotonically  increasing  and  unbounded. 
a  and  c  are  positive  reals,  and 
1  <  a  <  t*. 

Then:  /(«)  •  al(,l)  e  0{c,iM}). 

2.  Prove  that  PSPACE  is  closed  under 
a.  complement. 

h.  union. 

c.  concatenation. 

d.  Klccnestar. 

3.  Define  the  language: 

•  U  =  { <M.  u\  l'>  :  M  is  a  Turing  machine  that  accepts  w  within  space  s}, 
Prove  that  II  is  PSPACE-compIele. 

4.  In  Section  28.7.3,  we  defined  the  language  2-SAT  =  {<n’>:n>  is  a  wff  in 
Boolean  logic,  te  is  in  2-conjunctive  normal  form  and  ir  is  satisfiable }  and  saw  that 
it  is  in  P.  Show  that  2-SAT  is  NL-complele. 

5.  Prove  that  A"Bn  =  ( a"b"  :  //  >  0}  is  in  L. 

6.  In  Example  21 .5.  we  described  the  game  of  Nim.  We  also  showed  an  efficient  tech¬ 
nique  for  deciding  whether  or  not  the  current  player  has  a  guaranteed  win.  Define 
the  language: 

•  NIM  =  {</>>:  b  is  a  Nim  configuration  (i.e..  a  set  of  piles  of  sticks)  and  there 
is  a  guaranteed  win  for  the  current  player}. 

Prove  that  NIM  e  L. 

7.  Prove  Theorem  29.12  (The  Deterministic  Space  Hierarchy  Theorem). 


CHAPTER  30 


practical  Solutions 
for  Hard  Problems 

Ii  appears  unlikely  that  P  =  NP.  It  appears  even  more  unlikely  that,  even  if  it  does, 
a  proof  that  it  docs  will  lead  us  to  efficient  algorithms  to  solve  the  hard  problems 
that  we  have  been  discussing.  (We  base  this  second  claim  on  at  least  two  observa¬ 
tions.  The  first  is  that  people  have  looked  long  and  hard  for  such  algorithms  and  have 
failed  to  find  them.  The  second  is  that  just  being  polynomial  is  not  sufficient  to  assure 
efficiency  in  any  practical  sense.)  And  things  are  worse.  Some  problems,  for  example 
those  with  a  structure  like  generalized  chess,  are  provably  outside  of  P,  whatever  the 
verdict  on  NP  is.  Yet  important  applications  depend  on  algorithms  to  solve  these  prob¬ 
lems.  So  what  can  we  do? 


30.1  Approaches 

In  our  discussion  of  the  traveling  salesman  problem  at  the  beginning  of  Chapter  27,  we 
suggested  two  strategies  for  developing  an  efficient  algorithm  to  solve  a  hard  problem: 

Compromise  on  generality:  Design  an  algorithm  that  finds  an  optimal  solution  and  that 
runs  elficienlly  on  most  (although  not  necessarily  all)  problem  instances. This  approach  is 
particularly  useful  if  the  problems  that  we  actually  care  about  solving  possess  particular 
kinds  of  structures  and  we  can  find  an  algorithm  that  is  tuned  to  work  well  on  those  struc¬ 
tures.  We  have  already  considered  some  examples  of  this  approach: 

•  Very  large  real  instances  of  the  traveling  salesman  problem  can  be  solved  efficiently  by 
iteratively  solving  a  linear  programming  problem  that  is  a  relaxed  instance  of  the  exact 
problem.  Although,  in  principle,  it  could  happen  that  each  such  iteration  removes  only 
a  single  lour  from  consideration,  when  the  graph  corresponds  to  a  real  problem,  large 
numbers  of  tours  can  almost  always  be  eliminated  at  each  step. 

•  Some  very  large  Boolean  formulas  can  be  represented  efficiently  using  ordered 
binary  decision  diagrams  (OBDDs).as  described  in  B.1.3.That  efficient  represen¬ 
tation  makes  it  possible  to  solve  the  satisfiability  (SAT)  problem  efficiently. The 
OBDD  representation  of  a  randomly  constructed  Boolean  formula  may  not  be 
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compact.  But  OBDDs  exploit  exactly  the  kinds  of  structures  that  typically  appear 

in  formulas  that  have  been  derived  from  natural  problems,  such  as  digital  circuits. 

Compromise  on  optimality:  Design  an  approximation  algorithm  that  is  guaranteed 
to  find  a  good  (although  not  necessarily  optimal)  solution  and  to  do  so  efficiently. This 
approach  is  particularly  attractive  if  the  error  that  may  be  introduced  in  finding  the  so¬ 
lution  is  relatively  small  in  comparison  with  errors  that  may  have  been  introduced  in 
the  process  of  defining  the  problem  itself.  For  example,  in  any  real  instance  of  the  trav¬ 
eling  salesman  problem,  we  must  start  by  measuring  the  physical  world  and  no  such 
measurement  can  be  exact.  Or  consider  the  large  class  of  problems  in  which  we  seek  to 
maximize  (or  minimize)  the  value  of  some  objective  function  that  combines,  into  a  sin¬ 
gle  number,  multiple  numbers  that  measure  the  utility  of  a  proposed  solution  along 
two  or  more  dimensions.  For  example,  we  might  define  a  cost  function  for  a  proposed 
stretch  of  new  divided  highway  to  be  something  like: 

cost(s)  =  4  •  dollar -cost  (s)  -  2*  nuniher-of-liues-saveil-byis) 

-  1.5  •  commuting-hours-sa  ved- per-  week  ( .v). 

Since  the  objective  function  is  only  an  approximate  measure  of  the  utility  of  a  new 
road,  an  approximately  optimal  solution  to  a  highway  system  design  problem  may  be 
perfectly  acceptable. 

Compromise  on  both:  For  some  problems,  it  turns  out  that  if  we  make  some  assump¬ 
tions  about  problem  structure  then  we  can  find  very  good,  but  not  necessarily  optimum 
solutions  very  quickly.  For  example,  suppose  that  we  limit  the  traveling  salesman  prob¬ 
lem  to  graphs  that  satisfy  the  triangle  inequality  (as  described  in  Section  27.1).  Real 
world  maps  meet  that  constraint. Then  there  exist  algorithms  for  finding  very  good  solu¬ 
tions  very  quickly. 

A  fourth  approach,  useful  in  some  kinds  of  problems  is: 

Compromise  on  total  automation:  Design  an  algorithm  that  works  interactively 
with  a  human  user  who  guides  it  into  the  most  promising  regions  of  its  search  space. 


When  applied  to  many  practical  problems,  including  verifying  the  correct¬ 
ness  of  both  hardware  and  software  systems,  automatic  theorem  provers  face 
exponential  growth  in  the  number  of  paths  that  must  be  considered.  One 
way  to  focus  such  systems  on  paths  that  are  likely  to  lead  to  the  desired 
proofs  is  to  let  a  human  user  guide  the  system.  (H.  1.1 ) 


In  most  of  these  approaches,  we  arc  admitting  that  we  have  no  efficient  and  “direct” 
algorithm  for  finding  the  answer  that  we  seek.  Instead,  we  conduct  a  search  through  a 
space  that  is  defined  by  the  structure  of  the  problem  we  arc  trying  to  solve.  In  the  next 
two  sections  we  will  sketch  two  quite  different  approaches  to  conducting  that  search. 
In  particular,  we'll  consider: 

•  Approach  l:  The  space  is  structured  randomly.  Exploit  that  randomness. 

•  Approach  2:  The  space  isn't  structured  randomly  and  we  have  some  knowledge 
ahout  the  structure  that  exists.  Exploit  that  knowledge. 
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30.2  Randomized  Algorithms  and  the  Language  Classes 
BPP,  RP,  Co-RP  and  ZPP 

For  some  kinds  of  problems,  it  is  possible  to  avoid  the  expensive  behavior  of  an  ex¬ 
haustive  search  algorithm  by  making  a  sequence  of  random  guesses  that,  almost  all  of 
the  time,  converge  efficiently  to  an  answer  that  is  correct. 


EXAMPLE  30.1  Quicksort 

We’ll  illustrate  the  idea  of  random  guessing  with  a  common  algorithm  that  re¬ 
duces  the  expected  time  required  to  sort  a  list  from  0(n2)  to  0(n  log  n).  Given  a 
list  of  n  elements,  define  quicksort  B  as  follows: 

quicksort(list:  a  list  of  n  elements)  = 

1.  If  n  is  0  or  1,  return  Ibt.  Otherwise: 

2.  Choose  an  element  from  list.  Call  it  the  pivot. 

3.  Reorder  the  elements  in  list  so  that  every  element  that  is  less  than  pivot 
occurs  ahead  of  it  and  every  element  that  is  greater  than  pivot  occurs 
after  it.  If  there  are  equal  elements,  they  may  be  left  in  any  order. 

4.  Recursively  call  quicksort  with  the  fragment  of  list  that  includes  the  ele¬ 
ments  up  to,  but  not  including  pivot. 

5.  Recursively  call  quicksort  with  the  fragment  of  list  that  includes  all  the 
elements  after  pivot. 

Quicksort  always  halts  with  its  input  list  correctly  sorted.  At  issue  is  the  time  re¬ 
quired  to  do  so.  Step  3  can  be  done  in  O(n)  steps.  In  fact,  it  can  usually  be  implemented 
very  efficiently.  When  step  3  is  complete,  pivot  is  in  the  correct  place  in  list. 

In  the  worst  case,  quicksort  runs  in  0(rt2)  time.  This  happens  if,  at  each  step, 
the  reordering  places  all  the  elements  on  the  same  side  of  pivot.  Then  the 
length  of  list  is  reduced  by  only  1  each  time  quicksort  is  called.  In  the  best  case, 
however,  the  length  of  list  is  cut  in  half  each  time.  When  this  happens,  quicksort 
runs  in  0(n  log  n )  time. 

The  key  to  quicksort's  performance  is  a  judicious  choice  of  pivot.  One  particu¬ 
larly  bad  strategy  is  to  choose  the  first  element  of  list.  In  the  not  uncommon  case 
in  which  list  is  already  sorted,  or  nearly  sorted,  this  choice  will  force  worst-case 
performance.  Any  other  systematic  choice  may  also  be  bad  if  list  is  constructed  by 
a  malicious  attacker  with  the  goal  of  forcing  worst-case  behavior.  The  solution  to 
this  problem  is  to  choose  pivot  randomly.  When  that  is  done,  quicksort's  expected 
running  time,  like  its  best  case  running  time,  is  0(n  log  n). 


In  the  next  section,  we  11  take  the  idea  of  random  guessing  and  use  it  to  build  Turing 
machines  that  decide  languages. 
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30.2.1  Randomized  Algorithms 

A  randomized  algorithm  (sometimes  called  a  probabilistic  algorithm)  is  one  that 
exploits  the  random  guessing  strategy  that  we  have  just  described.  Randomized  algo¬ 
rithms  are  used  when: 

•  The  problem  at  hand  can  usually  be  solved  without  exhaustively  considering  all 
paths  to  a  solution. 

•  A  systematic  way  of  choosing  paths  would  be  vulnerable  to  common  kinds  of  bad 
luck  (for  example,  being  asked  to  sort  a  list  that  was  already  sorted)  or  to  a  malicious 
attacker  that  would  explicitly  construct  worst-case  instances  if  it  knew  how  to  do  so. 


Randomized  algorithms  are  routinely  exploited  in  cryptographic 
applications.  (J.3) 


We  can  describe  randomized  algorithms  as  Turing  machines.  Call  every  step  at 
which  a  nondclerminislic  Turing  machine  must  choose  from  among  competing  moves  a 
choice  point.  Define  a  randomized  Turing  machine  to  be  a  nondclerminislic  Turing 
machine  M  with  the  following  properties. 

•  At  every  choice  point,  there  are  exactly  two  moves  front  which  to  choose. 

•  At  every  choice  point.  M  (figuratively)  flips  a  fair  coin  and  uses  the  result  of  the 
coin  toss  to  decide  which  of  its  two  branches  to  pursue. 

Note  that  the  constraint  of  exactly  two  moves  at  each  choice  point  is  only  significant 
in  the  sense  that  it  will  simplify  our  analysis  of  the  behavior  of  these  machines.  Any 
nondclerminislic  Turing  machine  can  be  converted  into  one  with  a  branching  factor  of 
two  by  replacing  an  n- way  branch  with  several  two-way  ones. 

Since  the  coin  flips  are  independent  of  each  other,  we  have  that,  if  />  is  a  single  path 
in  a  randomized  Turing  machine  M  and  the  number  of  choice  points  along  b  is  k,  then 
the  probability  that  M  will  take  h  is: 


Pr  (/>)  =  2k. 

Note  that  every  deterministic  Turing  machine  is  a  randomized  Turing  machine  that 
happens,  on  every  input,  to  have  zero  choice  points  and  thus  a  single  branch  whose 
probability  is  I. 

Now  consider  the  specific  case  in  which  the  job  of  M  is  to  decide  a  language.  A  standard 
(nonrandomized)  nondeierministic  Turing  machine  accepts  its  input  tv  iff  there  is  at  least 
one  path  that  accepts.  A  randomized  Turing  machine  only  follows  one  path.  It  accepts  iff 
that  path  accepts.  It  rejects  iff  that  path  rejects.  So  the  probability  that  M  accepts  in  is  the 
sum  of  the  probabilities  of  all  of  Af  s  accepting  paths. The  probability  that  M  rejects  w  is  the 
sum  of  the  probabilities  of  all  of  \f  s  rejecting  paths  Alternatively,  it  is  I  -  Pr(  M  accepts). 

If  the  job  of  a  randomized  Turing  machine  M  is  to  accept  the  language  L .  then  there 
are  two  kinds  of  mistakes  it  could  make:  It  could  erroneously  accept  a  siring  that  is  not 
in  L.  or  it  could  erroneously  reject  one  that  is  We  would  like  to  be  able  to  place  a 
bound  on  the  likelihood  of  both  kinds  of  errors.  So: 
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•  We’ll  say  that  M  accepts  L  with  a  false  positive  probability,  eP,  iff  («’  g  L)  —* 
(Pr(Af  accepts  w)  s  eP). 

•  We'll  say  that  M  accepts  L  with  a  false  negative  probability,  eN,  iff  (w  e  L)  — ► 
(Pr(Af  rejects  u>)  ^  eN). 

If  M  is  a  randomized  Turing  machine,  we  define  timereq(M)  and  spacereq(M)  as  for 
standard  Turing  machines.  In  both  cases,  we  measure  the  complexity  of  the  worst  case 
of  M's  performance  on  an  input  of  size  n. 

We're  now  in  a  position  to  define  a  set  of  complexity  classes  based  on  acceptance  by 
randomized  luring  machines.  In  the  next  section,  we'll  define  four  such  classes,  all  of 
them  focused  on  accepting  in  polynomial  time.  It  is  possible  to  define  other  classes  as 
well.  For  example,  we  could  talk  about  languages  that  can  be  accepted  by  randomized 
Turing  machines  that  use  logarithmic  space. 


30.2.2  The  Language  Classes  BPP,  RP,  Co-RP,  and  ZPP 

Our  goal  is  to  recognize  a  language  with  reasonable  accuracy  in  a  reasonable  amount 
of  time.  When  we  use  randomization  to  do  that,  there  are  two  kinds  of  failure  modes 
that  we  must  consider: 

•  The  algorithm  always  runs  efficiently  but  it  may  (with  small  probability)  deliver  an 
incorrect  answer.  Algorithms  with  this  property  are  called  Monte  Carlo  algorithms. 

•  The  algorithm  never  returns  an  incorrect  answer  but  it  may  (with  small  probability)  be 
very  expensive  to  run.  Algorithms  with  this  property  are  called  Las  Vegas  algorithms. 

We  can  define  complexity  classes  based  on  imposing  constraints  on  both  kinds  of  fail¬ 
ures.  We  begin  with  the  first.  Define: 

The  Class  DTP:  Le  BPP  iff  there  exists  some  probabilistic  Turing  machine  M 
that  runs  in  polynomial  time  and  that  decides  L  with  a  false  positive  probability, 
eP.  and  a  false  negative  probability,  e^,  both  less  than  l/2"  The  name  BPP  stands 
for  Bounded-error,  Probabilistic,  Polynomial  lime. 

A  randomized  Turing  machine  that  decides  a  language  in  BPP  implements  a  Monte 
Carlo  algorithm.  It  is  allowed  to  make  both  kinds  of  errors  (i.e.,  false  positives  and  false 
negatives)  as  long  as  the  probability  of  making  either  of  them  is  less  than  */2.  We  can 
characterize  such  a  machine  in  terms  of  a  single  error  rate  e  =  mu. v(ep,  en).  The  re" 
quiremenl  that  e  be  less  than  or  equal  to  '/2  may  seem  too  weak.  It’s  hard  to  imagine 
saying  that  M  decides  L  if  it  only  gets  it  right  about  half  the  time.  But  it  is  possible  to 
prove  the  following  theorem. 


THEOREM  30.1  Reducing  the  Error  Rate 

Theorem.  Let  M  be  a  randomized,  polynomial-time  Turing  machine  with  error 
rate  e  that  is  a  constant  equal  to  max(e?,  eN).  If  0  <  e  <  »/2  and/(n)  is  any  poly- 

nomia  unction,  then  there  exists  an  equivalent  randomized,  polynomial-time 
Turing  machine  M  with  error  rate 


726  Chapter  30  Practical  Solutions  for  Hard  Problems 


Proof:  The  idea  is  that  M'  will  run  M  some  polynomial  number  of  limes  and  return 
the  answer  that  appeared  more  often.  If  ihe  runs  are  independent,  then  the  prob¬ 
ability  of  error  decreases  exponentially  as  the  number  of  runs  of  M  increases.  For 
a  detailed  analysis  that  shows  that  the  desired  error  bound  can  be  achieved  with 
a  polynomial  number  of  runs  of  M.  see  [Sipser  2< K )f> ] . 


So.  for  example,  the  definition  of  the  class  BPP  wouldn't  change  if  we  required  e  to 
be  less  than  1/10  or  1/3000  or  Note  that  the  latter  is  substantially  less  than  the 


i.viMNI 


probability  that  any  computer  on  which  M  runs  will  experience  a  hardware  failure  that 
would  cause  it  to  return  an  erroneous  result. 

Tire  class  BPP  is  closed  under  complement.  In  other  words.  BPP  =  co-BPP  since 
false  positives  and  false  negatives  are  treated  identically. 

Sometimes  it  is  possible  to  build  a  machine  Ur  accept  a  language  I.  and  to  guarantee 
that  only  one  kind  of  error  will  occur.  It  may  be  possible  to  examine  a  string  ic  and  to  de¬ 
tect  efficiently  some  property  that  proves  that  tc  is  in  /..  Or  it  may  Ire  possible  to  detect 
efficiently  some  way  in  which  ic  violates  the  membership  requirement  for  L.  So  define: 


The  Class  RP:  Le  RP  iff  there  exists  some  randomized  Turing  machine  M  that 
runs  in  polynomial  time  and  that  decides  L  and  where: 

•  if  ice  I.  then  M  accepts  ic  with  probability  I-es.  where  R\  <  1  ;•  and 

•  if  wg.  L  then  M  rejects  ic  with  probability  1  (i.e..  with  false  positive  probability 

£|>  =  0). 


The  name  RP  stands  for  Randomized.  Polynomial  lime. 


If  /-  is  in  RP.  then  it  can  be  decided  by  a  randomized  Turing  machine  that  may  reject 
when  it  shouldn't.  But  it  will  never  accept  when  it  shouldn’t.  Of  course,  it  may  also  be 
possible  to  build  a  machine  that  does  the  opposite.  So  define  the  complement  of  RP: 

The  Class  co-RP:  L  e  co-RP  iff  there  exists  some  randomized  Tilling  machine  M 
that  runs  in  polynomial  time  and  that  decides  I.  and  where: 

•  if  ice  L  then  M  accepts  ic  with  probability  1  (i.e..  with  false  negative  probability 

=  ()).and 

•  if  wg  L  then  M  rejects  ic  with  probability  w  here  «|»  <  1 1. 

Note  that,  as  in  the  definition  of  BPP.  the  error  probabilities  required  for  either  RP 
or  co-RP  can  be  anything  strictly  between  t)  and  ’ ,  without  changing  the  set  of  lan¬ 
guages  that  can  be  accepted. 

In  the  next  section,  we  will  present  a  randomized  algorithm  for  primality  testing.  An 
obvious  way  to  decide  whether  a  number  is  prime  would  be  to  look  for  the  existence  of 
a  factor  that  proves  that  the  number  isn't  prime.  The  algorithm  we  w  ill  present  doesn't 
do  that,  but  it  does  look  for  the  existence  of  a  certificate  that  proves  that  its  input  isn't 
prime.  If  it  finds  such  a  certificate,  it  can  report,  with  probability  1.  that  the  input  is 
composite.  If  it  fails  to  find  such  a  certificate,  then  it  reports  that  the  input  is  prime. 
That  report  has  high  probability  of  being  the  correct  answer.  We  will  use  our  algorithm 
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to  show  that  the  language  PRIMES  =  {«? :  w  is  the  binary  encoding  of  a  prime  num¬ 
ber}  is  in  co-RP  and"  the  language  COMPOSITES  =  {w :  w  is  the  binary  encoding  of 
a  composite  number}  is  in  RP. 

But  first  let's  consider  what  appears  to  be  a  different  approach  to  the  use  of  ran¬ 
domness.  Suppose  that  we  want  to  require  an  error  rate  of  0  and,  in  exchange,  we  are 
willing  to  accept  a  nonzero  probability  of  a  long  run  lime.  We  call  algorithms  that  satis¬ 
fy  this  requirement  Las  Vegas  algorithms. To  describe  the  languages  that  can  be  accepted 
by  machines  that  implement  algorithms  with  this  property,  define: 

The  Class  ZPP:  L  e  ZPP  iff  there  exists  some  randomized  Timing  machine  M 

such  that: 

•  if  w  e  L  then  M  accepts  w  with  probability  1 , 

•  if  it?  L  then  M  rejects  w  with  probability  1,  and 

•  there  exists  a  polynomial  function  f(n)  such  that,  for  all  inputs  w  of  length  n, 
the  expected  running  time  of  M  on  w  is  less  than /(/i).  It  is  nevertheless  pos¬ 
sible  that  M  may  run  longer  lhan/(n)  for  some  sequences  of  random  events. 

The  name  ZPP  stands  for  Zero-error,  Probabilistic,  Polynomial  time. 

There  are  two  other,  but  equivalent  ways  to  define  ZPP: 

•  ZPP  is  the  class  of  languages  that  can  be  recognized  by  some  randomized  Turing 
machine  M  that  runs  in  polynomial  time  and  that  outputs  one  of  three  possible  val¬ 
ues:  Accept ,  Reject ,  and  Don  7  Know.  M  must  never  accept  when  it  should  reject  nor 
reject  when  it  should  accept.  Its  probability  of  saying  Don't  Know  must  be  less  than 
1 2.  This  definition  is  equivalent  to  our  original  one  because  it  says  that,  if  M  runs 
out  of  time  before  determining  an  answer,  it  can  quit  and  say  Don't  Know, 

•  ZPP  =  RP  O  co-RP.  To  prove  that  this  definition  is  equivalent  to  our  original  one, 
wc  show  that  each  implies  the  other: 

•  (L  e  ZPP)  — *  (L  e  RP  H  co-RP):  If  L  is  in  ZPP,  then  there  is  a  Las  Vegas-style 
Turing  machine  M  that  accepts  it.  We  can  construct  Monte  Carlo-style  Turing 
machines  M\  and  M2  that  show  that  L  is  also  in  RP  and  in  co-RP,  respectively. 
On  any  input  w,  Mx  will  run  M  on  w  for  its  expected  running  time  or  until  it 
halts.  If  M  halls  naturally  in  that  time,  then  M |  will  accept  or  reject  as  M  would 
have  done.  Otherwise,  it  will  reject. The  probability  that  M  will  have  halted  is  at 
least  1 2,  so  the  probability  that  M\  will  falsely  reject  a  siring  that  is  in  L  is  less 
than  1 7.  Since  Mx  runs  in  polynomial  time,  it  shows  that  L  is  in  RP.  Similarly, 
construct  A/2  that  shows  that  L  is  in  co-RP  except  that,  if  the  simulation  of  M 
does  not  halt,  M2  will  accept, 

•  {L  e  RP  D  co-RP)  —*(Lg  ZPP):  If  L  is  in  RP,then  there  is  a  Monte  Carlo-style 
Turing  machine  M\  that  decides  it  and  that  never  accepts  when  it  shouldn't.  If  L 
is  in  co-RP,  then  there  is  another  Monte  Carlo-style  Turing  machine  M2  that  de¬ 
cides  it  and  that  never  rejects  when  it  shouldn't.  From  these  two,  we  can  con¬ 
struct  a  Las  Vegas-style  Turing  machine  M  that  shows  that  L  is  in  ZPP.  On  any 
input  w.  M  will  first  run  M\  on  w.  If  it  accepts,  M  will  halt  and  accept.  Otherwise 

M  will  i  un  M2  on  w.  If  it  rejects,  M  will  halt  and  reject.  If  neither  of  these  things 
happens,  it  will  try  again. 
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Randomization  appears  to  be  a  useful  tool  for  solving  some  kinds  of  problems.  But 
what  can  we  say  about  the  relationships  among  BPP.  RP  co-RP.  ZPP  and  the  other  com¬ 
plexity  classes  that  we  have  considered?  The  class  P  must  be  a  subset  of  all  four  of  the 
randomized  classes  since  a  standard,  deterministic,  polynomial-time  Turing  machine 
that  doesn't  happen  to  have  any  choice  points  satisfies  the  requirements  for  a  machine 
that  accepts  languages  in  all  of  those  classes.  Further,  we  have  already  shown  that  ZPP 
is  a  subset  of  both  RP  and  co-RP.  So  all  of  the  following  relationships  are  known. 

•  PCBPP 

•  PCZPPQRPCNP 

•  PQZPPC  co-RP  Cco-NP 

•  RPU  co-RP  QBPP 

There  are  two  big  unknowns.  One  is  the  relationship  between  BPP  and  NP.  Neither 
is  known  to  be  a  subset  of  the  other.  The  other  is  whether  P  is  a  proper  subset  of  BPP. 
It  is  widely  conjectured,  but  unproven,  that  BPP  =  P.  If  this  is  true,  then  randomiza¬ 
tion  is  a  useful  tool  for  constructing  practical  algorithms  for  some  problems  but  it  is 
not  a  technique  that  will  make  it  possible  to  construct  polynomial-time  solutions  for 
NP-completc  problems  unless  P  =  NP. 


30.2.3  Primality  Testing 

One  of  the  most  important  applications  of  randomized  algorithms  is  the  problem  of  pri- 
mality  checking.  We  mentioned  above  that  PRIMES  =  «,  w :  ir  is  the  binary  encoding  of 
a  prime  number}  is  in  co-RP  and  COMPOSITES  =  <  ip  :  tr  is  the  binary  encoding  of  a 
composite  number}  is  in  RP.  In  this  section,  we  will  see  why. 

Recall  that  the  obvious  way  to  decide  whether  an  integer  p  is  prime  is  to  consider  all 
of  the  integers  between  2  and  Vp.  checking  each  to  see  whether  it  divides  evenly  into  p. 

If  any  of  them  does,  then  p  isn’t  prime.  If  none  does,  then  p  is  prime. The  time  required 
to  implement  this  approach  is  C>(  V/;).  But  /i.  the  length  of  the  string  that  encodes  p,  is 
log  p.  So  this  simple  algorithm  is  0( 2"/:).  It  has  recently  been  shown  that  PRIMES  is  in 
P.  so  there  exists  a  polynomial-time  algorithm  that  solves  this  problem  exactly. 

But.  well  before  that  result  was  announced,  randomized  algorithms  were  being  used 
successfully  in  applications,  such  as  cryptography,  that  require  the  ability  to  perform 
primality  checking  quickly.  One  idea  for  a  randomized  algorithm  that  would  check  the 
primality  of  p  is  to  pick  randomly  some  proposed  factors  of  p  and  check  them.  If  any  of 
them  is  a  factor,  then  p  is  composite.  Otherwise,  claim  that/?  is  prime. The  problem  with 
this  idea  is  that,  if  p  is  large,  most  numbers  may  fail  to  be  factors,  even  if  /?  is  composite. 
So  it  would  be  necessary  to  try  a  very  large  number  or  possible  factors  in  order  to  be 
able  to  assert  with  high  probability  that  p  is  prime,  lbere  is  a  better  way. 

The  randomized  algorithm  that  we  are  about  to  present  is  similar  in  its  overall  structure 
to  the  factor-testing  method  that  we  just  rejected.  It  will  randomly  choose  some  numbers 
and  check  each  to  see  whether  it  proves  that  /?  is  not  prime.  If  none  of  them  docs,  it  will  re¬ 
port  that  p  is  (highly  likely  to  be)  prime.  Its  effectiveness  relies  on  a  few  fundamental  facts 
about  modular  arithmetic.  To  simplify  the  rest  of  this  discussion,  let  x  y,  read  “ x  is 
equivalent  toy  mod  p"  mean  that  r  and  y  have  the  same  remainder  when  divided  by  p 
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The  first  result  that  we  will  use  is  known  as  Fermat's  Little  Theorem  B.  It  tells  us  the 
following: 

•  If  p  is  prime,  then.  For  any  positive  integer  a.  if  gcci{a,  p)  =  1,  ap~ 1  sp  1, 

Recall  that  the  greatest  common  divisor  (gcd)  of  two  integers  is  the  largest  integer 
that  is  a  factor  of  both  of  them.  We'll  say  that  p  passes  the  Fermat  test  at  a  iff  ap  ~ 1  ~p  I. 

For  example,  let  p  —  5and«  =  3.Then3*5_l>  =  81  =5 1.  So  5  passes  the  fermat  test  at 
3.  which  it  must  do  since  5  is  prime  and  3  and  5  are  relatively  prime.  But  now  let  p  -  8 
and  a  =  3.  Then  3<s" =  2187  3.  So  8  fails  the  Fermat  test  at  3,  which  is  consistent 

with  the  theorem,  since  8  is  not  prime.  Whenever  p  fails  the  Fermat  test  at  a,  we’ll  say 
that  a  is  a  Fermat  witness  that  p  is  composite. 

Fermat's  Little  Theorem  tells  us  that  if  p  is  prime,  then  it  must  pass  the  Fermat  test 
at  every  appropriately  chosen  value  of  a.  Can  we  turn  this  around?  If  p  passes  the  Fer¬ 
mat  test  at  some  value  a,  do  we  know  that  p  is  prime?  The  answer  to  this  question  is  no. 

If  p  is  composite  and  yet  it  passes  the  Fermat  test  at  a,  we  will  say  that  a  is  a  Fermat  liar 
that  p  is  prime. 

Fermat’s  Little  Theorem  is  the  basis  for  a  simple  randomized  algorithm  for  de¬ 
ciding  the  primality  of  p.  We'll  randomly  choose  values  for  a,  looking  for  a  witness 
that  p  is  composite.  We’ll  only  consider  values  that  are  less  than  p.  So,  if  p  is  prime, 
gcd(a.p)  will  always  be  l.Thus  our  algorithm  will  not  have  to  evaluate  gcd.  If  we 
fail  to  find  a  witness  that  shows  that  p  is  composite,  we'll  report  that  p  is  probably 
prime.  Because  liars  exist,  we  can  increase  the  likelihood  of  finding  such  a  witness, 
if  one  exists,  by  increasing  the  number  of  candidate  witnesses  that  we  test.  So  we’ll 
present  an  algorithm  that  takes  two  inputs,  a  value  to  be  tested  and  the  number  of 
possible  witnesses  that  should  be  checked.  The  output  will  be  one  of  two  values: 
composite  and  probably  prime. 

simple Fcrmat{p:  integer,  k:  integer)  = 

1.  Do  k  times: 

1.1.  Randomly  select  a  value  a  in  the  range  [2:  p  -  1]. 

1.2.  If  it  is  not  true  that  ap~ 1  =p  1,  then  return  composite. 

2.  All  tests  have  passed.  Return  probably  prime. 

Modular  exponentiation  can  be  implemented  efficiently  using  the  technique  of  succes¬ 
sive  squaring  that  we  describe  in  Example  J.l.  So  simpleFermat  runs  in  polynomial  time. 
All  that  remains  is  to  determine  its  error  rate  as  a  function  of  k.  With  the  exception  of  a 
small  class  of  special  composite  numbers  that  we  will  describe  below,  if  p  is  composite, 
then  the  chance  that  any  a  is  a  Fermat  liar  for  it  is  less  than  Vi.  So,  again  with  the  exception 
we  are  about  to  describe,  the  ercor  rate  of  simpleFermat  is  less  than  1/2*. 

But  now  we  must  consider  the  existence  of  composite  numbers  that  pass  the  Fermat 
test  at  all  values.  Call  such  numbers  Carmichael  numbers  B.  Every  value  of  a  is  a  Fermat 
liar  for  every  Carmichael  number,  so  no  value  of  k  will  enable  simpleFermat  to  realize 
that  a  Carmichael  number  isn't  prime. 

However,  there  is  a  separate  randomized  test  that  we  can  use  to  detect  Carmichael 
numbers.  It  is  based  on  the  following  fact:  If  p  is  prime,  then  1  has  exactly  two  square 
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roots  (mod  p ):  1  and  -1.  If.  on  the  other  hand,/?  is  composite,  it  is  possible  that  1  has 
three  or  more  square  roots  (mod  p).  For  example,  let  p  =  8.  Then  we  have: 

•  l2  =  t. 

•  32  =  9-„l. 

•  52  =  25  1.  Note  that  5  =K  -3. 

•  72  =  49  1 .  Note  that  7  =#  —1 . 

So  we  can  write  the  four  square  roots  of  1  (mod  8)  as  1,  -1.3,  -3.  For  every 
Carmichael  number  n,  1  has  more  than  two  square  roots  (mod  /?).  For  example,  the 
smallest  Carmichael  number  is  561.  The  square  roots  of  1  (mod  561)  are  1,-1(560), 

67,  -67(494),  188(-373),  254,  and  -254(-307). 

While  simpleFermat  cannot  distinguish  between  primes  and  Carmichael  numbers,  a 
randomized  test  based  on  finding  square  roots  can.  We  could  design  such  a  test  that  just 
chooses  random  values  and  checks  to  see  whether  they  are  square  roots  of  1  (modp). 

If  any  is,  then  p  isn’t  prime.  And,  unlike  with  simple  Fermat,  there  exist  witnesses  even 
for  Carmichael  composite  numbers.  But  there's  a  more  efficient  way  to  find  additional 
square  roots  if  they  exist.  Suppose  that  we  have  done  the  simpleFermaf  test  at  a  and  a 
has  passed.  Then  we  know  that  ap~ 1  —p  1.  Taking  the  square  root  of  both  sides,  we  get 
that  a{p  i)l2  is  a  square  root  of  1  (mod p).  If  a[p~ 1)12  1,  we  haven’t  learned  anything. 

But  then  we  can  again  take  the  square  root  of  both  sides  and  continue  until  one  of  the 
following  things  happens. 

1.  We  get  a  root  that  is  —  1 .  We’re  not  interested  in  finding  square  roots  of  —1.  So  we 
give  up,  having  failed  to  show  that  any  additional  roots  exist.  It  is  possible  that  p 
is  prime  or  that  it  is  composite. 

2.  Wc  get  a  noninteger.  Again,  we  simply  stop  as  in  case  1. 

3.  We  get  a  root  that  is  neither  1  nor  —1.  We  have  shown  that  p  is  composite. 

So  (taking  all  results  (mod  p))  we  check: 

•  ap~ 1  (and,  as  in  simpleFermaf,  assert  composite  if  we  get  any  value  other  than  l),then 

•  a{p  ~  1 1,2  (and  assert  composite  if  we  get  any  value  other  than  1  or  - 1 ),  then 

•  a(p~ 1)/4  (and  assert  composite  if  we  get  any  value  other  than  1  or  - 1),  and  so  forth, 
quitting  as  described  above. 

The  most  efficient  way  to  generate  this  set  of  tests  is  in  the  opposite  order.  But,  to  do 
that,  we  need  to  know  where  to  start.  In  particular,  we  need  to  know  when  the  result  (if 
we  were  going  in  the  order  we  described  above)  would  no  longer  be  an  integer.  Sup¬ 
pose  that  p  -  1  is  represented  in  binary. Then  it  it  can  be  rewritten  as  <1  •  2\  where  d  is 
odd.  (The  number  s  is  the  number  of  trailing  0’s  and  the  number  </  is  what  is  left  after 
the  trailing  O’s  are  removed.)  The  number  of  times  that  we  would  be  able  to  take  the 
square  root  of  ap  ~ 1  and  still  get  an  integer  is  s.  So  we  compute  (mod  p)  the  reverse  of 
the  sequence  we  described  above: 
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Then  we  check  the  sequence  right  to  left.  If  the  last  element  is  not  1,  then  a  fails  the 
simple  Fermat  test  and  we  can  report  that  p  is  composite.  Otherwise,  as  long  as  the  val¬ 
ues  are  1,  we  continue.  If  we  encounter  -1,  we  must  quit  and  report  that  we  found  no 
evidence  that  p  is  composite.  If,  on  the  other  hand,  we  find  some  value  other  than  1  or 
—1,  we  can  report  that  p  is  composite. 

Using  this  idea,  we  can  state  the  following  algorithm,  which  is  generally  known  as 
the  Miller- Rabin  test : 

Miller-Rabin(p:  integer,  k:  integer)  = 

1.  If  p  =  2,  return  prime.  Else,  if  p  is  even,  return  composite. 

2.  Rewrite  p  -  1  as  d  •  2*,  where  d  is  odd. 

3.  Do  k  times: 

3.1.  Randomly  select  a  value  a  in  the  range  [2:  p  -  1]. 

32.  Compute  the  following  sequence  (mod  p): 

a^V’2' . ad'2'. 

33.  If  the  last  element  of  the  sequence  is  not  1,  then  a  fails  the  simple  Fer¬ 
mat  test.  Return  composite. 

3.4.  For  /  =  s  —  1  down  to  0  do: 

If  aJ'T  =  —1,  then  exit  this  loop.  Otherwise,  if  it  is  not  1,  then 
return  composite. 

4.  All  tests  have  passed.  Return  probably  prime. 

Miller-Rabin  runs  in  polynomial  time  and  can  be  shown  3  to  have  an  error  rate 
that  is  less  than  1/4*.  So  it  proves  the  claim,  made  above,  that  the  language  COMPOS¬ 
ITES  is  in  RP.The  efficiency  of  the  algorithm  can  be  improved  in  various  ways.  One  is 
to  check  the  elements  of  the  sequence  as  they  are  generated.  It’s  harder  to  see  how  to 
do  that  correctly,  but  it  can  cut  out  some  tests.  In  fact,  the  algorithm  is  generally  stated 
in  that  form. 


While  randomized  algorithms  provide  a  practical  way  to  check  for  primality  and 
thus  to  find  large  prime  numbers,  they  do  not  tell  us  how  to  factor  a  large  num¬ 
ber  this  is  known  not  to  be  prime.  Modem  cryptographic  techniques,  such  as  the 
RS A  algorithm,  rely  on  two  important  facts:  Generating  primes  can  be  done  ef¬ 
ficiently,  but  no  efficient  technique  for  factoring  composites  is  known.  (J.3) 


.3  Heuristic  Search 

For  some  problems,  randomized  search  works  well.  But  suppose  that  we  have  some 
useful  information  about  the  shape  of  the  space  that  we  are  attempting  to  search.  Then 
it  may  make  sense  not  to  behave  randomly  but  instead  to  exploit  our  knowledge  each 
time  a  choice  needs  to  be  made. 
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30.3.1  An  Introduction  to  Heuristic  Search 

A  large  class  of  important  problems  can  be  described  genetically  as: 

•  a  space  of  states  that  correspond  to  configurations  of  the  problem  situation. 

•  a  start  slate. 

•  one  or  more  goal  stales.  If  there  arc  multiple  goal  stales,  then  the  set  of  them  must 
be  efficiently  decidable. 

•  a  set  of  operators  (with  associated  costs)  that  describe  how  it  is  possible  to  move 
from  one  slate  to  another. 

Many  puz/.lcs  are  easy  to  stale  in  this  way.  For  example: 

•  In  the  15-puzzle,  which  wc  described  in  Example  4.8,  the  slates  correspond  to 
arrangements  of  tiles  on  the  board.  An  instance  of  the  puzzle  specifies  a  particular 
arrangement  of  tiles  as  the  start  stale.  There  is  a  single  goal  stale  in  which  the  tiles 
arc  arranged  in  numeric  order.  And  there  is  a  set  of  legal  moves.  Specifically,  it  is 
possible  to  move  from  one  state  to  another  by  sliding  any  tile  that  is  adjacent  to  the 
empty  square  into  the  empty  square. 

•  In  the  game  Instant  Insanity  *',  which  wc  describe  in  N.2.2.  the  slates  correspond  to 
the  arrangement  of  blocks  to  form  a  slack.  The  start  state  describes  a  set  of  blocks, 
none  of  which  is  in  the  stack. There  is  a  set  of  goal  states  (since  a  goal  state  is  any 
state  in  which  there  is  a  stack  that  contains  all  the  blocks  and  the  colors  are  lined  up 
as  required).  And  there  is  a  set  of  operators  that  correspond  to  adding  and  remov¬ 
ing  blocks  from  the  slack. 

More  significantly,  many  real  problems  can  also  be  described  as  state  space  search. 
For  example,  an  airline  scheduling  problem  can  be  described  as  a  search  through  a 
space  in  which  the  states  correspond  to  partial  assignments  of  planes  and  crews  to 
routes.  The  start  state  contains  no  assignments.  Any  state  that  assigns  a  plane  and  a 
crew  to  every  flight  and  that  meets  some  prescribed  set  of  constraints  is  a  goal  state. 
The  operators  move  from  one  state  to  the  next  by  making  (or  unmaking)  plane  and 
crew  assignments. 

Now  suppose  that  we  arc  given  a  state  space  search  problem  and  asked  to  find  the 
shortest  (cheapest)  path  from  the  start  slate  to  a  goal.  One  approach  is  to  conduct  a  sys¬ 
tematic  search  through  the  slate  space  by  applying  operators,  starting  from  the  start 
state.The  problem  with  this  technique  is  that,  for  many  problems,  the  number  of  possible 
states  and  thus  the  number  of  paths  that  might  have  to  lie  examined  grows  exponentially 
with  the  size  of  the  problem  being  considered.  For  example,  if  wc  generalize  the  15-puzzle 
to  the  n-puzzle  (where  n  =  k2  -  1  for  some  positive  integer  k).  then  the  number  of  dis¬ 
tinct  puzzle  states  is  (n  +  1 )!  It  can  be  shown  that  the  problem  of  finding  the  shortest  se¬ 
quence  of  moves  that  transforms  a  given  start  state  into  the  goal  state  is  NP-hard. 

In  Section  28.7.4,  wc  showed,  by  exhibiting  a  C7(»*)  algorithm  to  decide  it,  that  the  lan¬ 
guage  SHORTEST-PATH  =  { <G,  u.  v,  k>:  G  is  an  unweighted,  undirected  graph,  u, 
and  v  are  vertices  in  G .  k  s  0.  and  there  exists  a  path  front  u  to  v  whose  length  is  at  most 
&}  is  in  P.  And  we  pointed  out  that  the  extension  of  SHORTEST-PATH  to  the  case  of 
weighted  graphs  is  also  in  P.  So  why  can't  an  instance  of  the  /i-puz/le  (and  other  problems 
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like  it)  be  solved  in  polynomial  time?  The  problem  is  to  find  the  shortest  path  from  the 
start  state  to  a  goal  slate  and  we  appear  to  know  how  to  do  that  efficiently. 

Tire  answer  is  simple.  An  instance  of  SHORTEST-PATH  must  contain  an  expficit 
description  of  the  entire  space  that  is  to  be  searched.  The  vertices  of  G  correspond  to 
the  states  and  the  edges  of  G  correspond  to  the  operators  that  describe  the  legal  moves 
from  one  stale  to  the  next.  When  we  say  that  SHORTEST-PATH  is  in  P.  we  mean  that 
the  amount  of  time  that  is  required  to  search  the  space  and  find  the  shortest  path  is  a 
polynomial  function  of  the  length  of  that  complete  slate  description. 

But  now  consider  an  instance  of  the  /i-puzzle.  It  can  be  described  much  more  suc¬ 
cinctly.  Instead  of  explicitly  enumerating  all  the  board  configurations  and  the  moves  be¬ 
tween  them,  we  can  describe  just  the  start  and  goal  configurations,  along  with  a  function 
that  defines  the  operators.  This  function,  when  given  a  state,  returns  a  set  of  successor 
slates  and  associated  costs.  What  we  do  not  have  to  do  is  to  list  explicitly  the  (n  +  1)1 
slates  that  describe  configurations  of  the  puzzle.  An  exhaustive  search  that  requires 
considering  that  list  would  require  time  that  is  exponential  in  the  length  of  the  succinct 
description  that  we  want  to  use,  even  though  it  would  have  been  polynomial  in  the 
length  of  the  explicit  description  that  is  required  for  an  instance  of  SHORTEST-PATH. 

Simple  problems  like  the  H-puzzle  and  Instant  Insanity,  as  well  as  real  problems,  like 
airline  scheduling,  arc  only  solvable  in  practice  when: 

•  there  exists  a  succinct  problem  description,  and 

•  there  exists  a  search  technique  that  can  find  acceptable  solutions  without  expand¬ 
ing  the  entire  implicitly  defined  space. 

We’ve  already  described  a  way  to  construct  succinct  problem  descriptions  as  state 
space  search.  It  remains  to  find  efficient  algorithms  that  can  search  the  spaces  that  are 
defined  in  that  way.  For  many  problems,  if  we  want  an  optimal  solution  and  we  have  no 
additional  information  about  how  to  find  one,  we  are  stuck.  But  for  many  problems,  ad¬ 
ditional  information  is  available.  All  we  have  to  do  is  to  find  a  way  to  exploit  it. 

A  heuristic  U  is  a  rule  of  thumb.  It  is  a  technique  that,  while  not  necessarily  guaran¬ 
teed  to  work  exactly  all  of  the  time,  is  useful  as  a  problem-solving  tool.  The  word 
‘‘heuristic  comes  from  the  Greek  word  evpiau-e iv  ( heirriskein ),  meaning  to  “to  find” 
or  “to  discover. '  which  is  also  the  root  of  the  word  “eureka,”  derived  from  Archimedes’ 
reputed  exclamation,  heurika  (meaning  “1  have  found”),  spoken  when  he  had  just  dis¬ 
covered  a  method  lor  determining  the  purity  of  gold.  Heuristics  typically  work  because 
they  exploit  relevant  knowledge  about  the  problem  that  they  are  being  used  to  solve. 

A  heuristic  search  algorithm  is  a  search  algorithm  that  exploits  knowledge  of  its 
problem  space  to  help  it  find  an  acceptable  solution  efficiently.  One  way  to  encode  that 
knowledge  is  in  the  operators  that  are  supplied  to  the  program.  For  example,  instead  of 
defining  operators  that  correspond  to  all  the  legal  moves  in  a  problem  space,  we  might 
define  only  operators  that  correspond  to  generally  “sensible"  moves.  Another  very 
useful  way  is  to  deline  a  heuristic  function  whose  job  is  to  examine  a  state  and  return  a 
measure  ol  how  desirable  it  is. That  score  can  then  be  used  by  the  search  algorithm  as 
it  chooses  which  stales  to  explore  next.  It  is  sometimes  useful  to  define  heuristic  func¬ 
tions  that  assign  high  scores  to  states  that  merit  further  exploration.  In  other  cases,  it  is 
useful  to  define  heuristic  functions  that  measure  cost.  For  example,  we  might  assign  to 
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a  stale  a  score  that  estimates  the  cost  of  gelling  from  that  state  to  a  goal.  When  we  do 
this,  we  assign  low  scores  to  the  states  that  most  merit  further  consideration. 


30.3.2  The  A *  Algorithm 

In  this  section,  we  w  ill  describe  one  very  general  and  effective  heuristic  search  algorithm. 
The  A*  algorithm  finds  the  cheapest  path  from  a  start  state  to  a  goal  slate  in  a  succinctly 
described  stale  space.  It  exploits  a  version  of  best-first  search  in  which  a  heuristic  func¬ 
tion  that  evaluates  states  as  they  are  generated  guides  the  algorithm  so  that  it  looks  first 
in  the  part  of  the  space  that  is  most  likely  to  contain  the  desired  solution. 


The  A*  algorithm  is  widely  used  to  plan  routes  for  the  agents  in  video 
games.  (N.3.2) 


Because  what  we  are  trying  to  do  is  to  find  a  cheapest  path,  the  score  we  would  like 
to  be  able  to  compute,  for  any  stale  n  is: 

f*(n)  =  cost  of  getting  from  the. start  state  to  a  goal  stale  via  a  path  that  goes 
through  n. 

We  can  break  /*(«)  into  two  components. 

/*(w)  =  g*{n)  T  /i*(«).  where: 

•  g*(n)  is  the  cost  of  getting  from  the  start  state  to  w.and 

•  is  the  cost  of  getting  the  rest  of  way.  i.e..  the  cost  of  getting  from  n  to  a  goal. 

If  we  have  generated  the  state  //.  then  we  know  the  cost  of  at  least  one  way  of  getting 
to  it.  So  we  have  an  estimate  of  g*(n).  But  we  don't  know  If.  however,  we  have 

information  about  the  problem  that  allows  us  to  estimate  h*{n).  we  can  use  it.  We’ll  de¬ 
note  an  estimate  of  a  function  by  omitting  the  *  symbol.  So  we  have: 

/('»)  =  g(n)  +  Hit). 

The  function  /(»)  will  be  used  to  guide  the  search  process.  The  function  h(n)  will 
evaluate  a  state  and  return  an  estimate  of  the  cost  of  getting  from  it  to  a  goal. 

In  the  rest  of  this  discussion,  we  will  assume  two  things: 

•  There  is  some  positive  number  c  such  that  all  operator  costs  are  at  least  c.  We  make 
this  assumption  because,  if  negative  costs  are  allowed,  there  may  be  no  cheapest 
path  from  the  start  state  to  a  goal.  It  is  possible,  in  lhat  case,  that  any  path  could  be 
made  cheaper  by  repealing  some  negative  cost  operator  one  more  time.  And  if 
costs  can  keep  getting  smaller  and  smaller,  then  the  cheapest  path  to  a  goal  might 
be  one  with  an  infinite  number  of  steps.  No  procedure  that  halts  w  ill  ever  be  able  to 
output  such  a  path. 

•  Every  state  has  a  finite  number  of  successor  states. 


30.3  Heuristic  Search  735 


The  most  straightforward  version  of  the  A*  algorithm  conducts  a  search  through  a 
tree  of  possible  paths.  In  this  version,  we  ignore  the  possibility  that  the  same  slate 
might  be  generated  along  several  paths.  If  it  is,  it  will  be  explored  several  times.  We’ll 
present  this  version  first. Then  we’ll  consider  a  graph-based  version  of  the  technique.  In 
this  second  algorithm,  we  will  check,  when  a  state  is  generated,  to  see  if  it  has  been  gen¬ 
erated  before.  If  it  has,  we  will  collapse  the  paths. The  second  version  is  more  complex 
to  state,  but  it  may  be  substantially  more  efficient  at  solving  problems  in  which  it  is 
likely  that  the  same  stale  could  be  reached  in  many  different  ways. 

To  see  the  difference,  consider  the  partial  search  shown  in  Figure  30.1.  Suppose  that, 
given  the  search  as  shown  in  (a),  the  next  thing  that  happens  is  that  state  C  is  consid¬ 
ered,  its  successors  are  generated,  and  one  of  its  successors  is  the  state  labeled  £.The 
tree-search  version  of  A *  won’t  notice  that  £  has  already  been  generated  another  way. 

It  will  simply  build  a  new  search  tree  node,  as  shown  in  (b),  that  happens  to  correspond 
to  the  same  state  as  £.  If  it  decides  that  £  is  a  good  state  to  continue  working  from,  it 
may  explore  the  entire  subtree  under  £  twice,  once  for  £  and  once  for  £'.  On  the  other 
hand,  the  graph-search  version  of  A*  will  notice  that  it  has  generated  £  before.  It  will 
build  the  search  graph  shown  as  (c). 

A*  is  a  best-first  search  algorithm.  So  it  proceeds,  at  each  step,  by  generating  the  suc¬ 
cessors  of  the  state  that  looks  most  promising  as  a  way  to  get  cheaply  to  a  goal.  To  see 
how  it  works, consider  the  search  shown  in  Figure  30.2.  Slate  A  is  the  start  state.  It  is  ex¬ 
panded  (i.e.,  its  successors  are  generated)  first.  In  this  example,  shown  in  (a),  it  has  two 
successors:  /i.  which  costs  1  to  generate,  and  C,  which  costs  3  to  generate.  Let’s  say  that 
the  value  of /i(£)  is3,so/(£)  =  g(fl)  +  li(B)  —  1  4-  3  =  4.  Similarly, if  h(C)  is 2, then 
/(C)  =  3  +  2  =  5.  The  expression  (g  +  h)  for  each  slate  is  shown  directly  under  it. 

A*  maintains  a  set,  called  OPEN,  of  the  nodes  (corresponding  to  states)  that  have 
been  generated  but  not  yet  expanded.  So  OPEN  is  now  { B,  C}.  A*  chooses  to  expand 
next  the  element  of  OPEN  that  has  the  lowest /value. That  element  is  B.  So  B' s  succes¬ 
sors  are  generated,  producing  the  search  tree  shown  in  (b).  Notice  that  the  cost  of  get¬ 
ting  to  D  is  1  +2;  the  cost  of  getting  to  £  is  1  +3.  OPEN  is  now  (C,  D,  £}.  The  state 
with  the  lowest  /  value  is  C.  so  it  is  expanded  next.  Suppose  it  has  one  successor,  F. 
Then  the  search  tree  is  as  shown  in  (c).  F  will  be  expanded  next,  and  so  forth,  until  a 
goal  (a  state  with  an  h  value  of  0)  is  expanded. 


FIGURE  30.1  Tree  search  versus  graph  search. 
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FIGURE  30.2  Best-first  search. 

Note  two  things  about  the  process  that  we  have  just  described: 

•  If  the  subtree  under  B  had  remained  promising,  none  of  the  subtree  under  C  would 
have  been  generated.  If  the  subtree  under  C  remains  promising,  no  more  of  the 
subtree  under  B  will  be  generated.  If  C  does  turn  out  to  be  on  the  shortest  path  to  a 
goal,  we  wasted  some  time  exploring  the  subtree  under  B  because  h(B)  underesti¬ 
mated  the  cost  of  getting  to  a  goal  from  B.  In  so  doing,  it  made  B  look  more  prom¬ 
ising  than  it  was.  The  better  h  is  at  estimating  the  true  cost  of  getting  to  a  goal,  the 
more  efficient  A*  will  be. 

•  The  search  process  cannot  stop  as  soon  as  a  goal  state  is  generated.  Goal  states  have 
h  values  of  0.  But  a  goal  stale  may  have  a  high  value  of  /  =  g  +  It  if  the  path  to  it  was 
expensive.  If  we  want  to  guarantee  to  find  the  shortest  path  to  a  goal,  the  search 
process  must  continue  until  a  goal  state  is  chosen  for  expansion  (on  the  basis  of  hav¬ 
ing  the  lowest  total /value). To  see  why  this  is  so.  return  to  ihe  situation  shown  above 
as  (c).  Suppose  that  F  has  a  single  successor  O’,  it  costs  b  to  go  from  F  to  G,  and  G  is  a 
goal  state.  Then  we  have  /(G)  =  g(G)  +  //(G)  =  12  +  I)  =  12.  If  the  search 
process  quits  now.  it  has  found  a  path  of  cost  12.  But.  given  what  we  know,  it  is  possi¬ 
ble  that  either  D  or  £  could  lead  to  a  cheaper  path.To  see  w  hether  or  not  one  of  them 
does,  we  must  expand  them  until  all  of  their  successors  have  /  values  of  12  or  more. 

The  algorithm  A* -tree ,  which  we  state  next,  implements  the  process  that  we  just  de¬ 
scribed.  We'll  slate  the  algorithm  in  terms  of  nodes  in  a  search  tree.  Each  node  corre¬ 
sponds  to  a  state  in  the  problem  space. 

A*-tree(P: stale  space  search  problem)  = 

1.  Stan  with  OPEN  containing  only  the  node  corresponding  to  P' s  start  state.  Set 
that  node's#  value  tot),  its  //value  to  whatever  it  is.  and  ils/value  tot)  +  h  ~  h. 

2.  Until  an  answer  is  found  or  there  are  no  nodes  left  in  OPEN  do: 

2.1.  If  there  are  no  nodes  left  in  OPEN,  return  Failure.  There  is  no  path 
from  the  start  state  to  a  goal  state. 
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2.2.  Choose  from  OPEN  a  node  such  that  no  other  node  has  a  lower  f 
value.  Call  the  chosen  node  BESTNODE.  Remove  it  from  OPEN. 

23.  If  BESTNODE  is  a  goal  node,  halt  and  return  the  path  from  the  ini¬ 
tial  nixie  to  BESTNODE. 

2.4.  Generate  the  successors  of  BESTNODE.  For  each  of  them  do: 

Compute  /,  g,  and  h,  and  add  the  node  to  OPEN. 

If  there  exist  any  paths  from  the  start  slate  to  a  goal  state,  A*-tree  will  find  one  of 
them.  So  we’ll  say  that  A*-tree  is  complete. 

But  wc  can  make  an  even  stronger  claim:  We’ll  say  that  h(n)  is  admissible  iff  it  never 
overestimates  the  true  cost  li*(n)  of  getting  to  a  goal  from  n.  If  Ir  is  admissible,  then 
A* -tree  finds  an  optimal  (i.e..  cheapest  path).To  see  why  this  is  so, consider  the  role  of  Iv. 

•  If  h  always  returns  0,  it  offers  no  information.  A*-tree  will  choose  BESTNODE 
based  only  on  the  computed  cost  of  reaching  it.  So  it  is  guaranteed  to  find  a  path 
with  a  lowest  cost.  If,  in  addition  to  h  being  0,  all  operator  costs  are  the  same,  then 
A*-tree  becomes  breadth-first  search. 

•  If  h(n)  always  returns  h*{n).  i.e..  the  exactly  correct  cost  of  getting  to  a  goal  from  n, 
then  A*-tree  will  walk  directly  down  an  optimal  path  and  return  it. 

•  If  h(n)  overestimates  li*(n).  then  it  effectively  “hides”  a  path  that  might  turn  out  to 
be  the  cheapest.  To  see  how  this  could  happen,  consider  the  search  trees  shown  in 
Figure  30.3.  After  reaching  the  situation  shown  in  (b),  A*-tree  will  halt  and  return 
the  path  (with  cost  5)  from  A  to  B  to  D.  But  suppose  that  there  is  an  operator  with 
a  cost  of  1  that  can  be  applied  to  C  to  produce  a  goal.  That  would  produce  a  path  of 
cost  4.  A*-tree  will  never  find  that  path  because  the  h  estimate  of  12  blocked  it  from 
being  considered. 

•  But  if  /»(«)  errs  in  the  other  direction  and  underestimates  the  true  cost  of  getting  to 
a  goal  from  n,  its  error  will  be  discovered  when  the  observed  cost  of  the  path  ex¬ 
ceeds  the  estimated  cost  without  a  goal  being  found.  When  that  happens,  A*-tree 
will  switch  to  a  cheaper  path  if  there  is  one. 


FIGURE  30.3  What  happens  if  h 
overestimates  /i*. 
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So  the  only  way  that  A*-tree  can  find  and  return  a  path  that  is  more  expensive  than 
some  other  path  it  could  Have  found  is  the  case  in  which  h  overestimates  the  true  cost  h*. 

Some  simple  heuristic  functions  are  always  admissible.  For  example,  if  the  true  cost 
of  getting  between  points  A  and  B  is  the  distance  between  them  along  roads  in  a  plane, 
then  Euclidean  distance  (the  length  of  a  straight  line  between  two  points)  is  admissible. 
And,  of  course,  the  heuristic  function  that  simply  returns  0  is  always  admissible.  But, 
for  some  problems,  it  may  be  hard  to  find  a  heuristic  function  that  is  informative  but 
never  runs  the  risk  of  overestimating  true  costs.  In  those  cases,  the  following  further 
observation  is  important.  We'll  call  it  the  graceful  decay  of  admissibility.  If  h  rarely 
overestimates  It*  by  more  than  5.  then  A*-iree  will  rarely  find  a  solution  whose  cost  is 
more  than  5  greater  than  the  cost  of  the  optimal  solution.  So,  as  a  practical  matter, 

A* -tree  will  find  very  good  paths  unless  It  makes  large  errors  of  overestimation. 

So  we  have  that  A* -tree  is  optimal  in  one  sense:  It  finds  the  best  solutions.  Search  algo¬ 
rithms  that  are  optimal  in  this  sense  are  called  admissible.  So  A*-tree  is  admissible.  But  is 
its  own  performance  optimal  or  might  it  be  possible  to  find  cheapest  paths  by  exploring  a 
smaller  number  of  nodes?  The  answer  is  that  A* -tree  is  not  optimal  in  this  sense. The  rea¬ 
son  is  that  it  may  explore  identical  subtrees  more  than  once.  As  we  suggested  above,  the 
way  to  fix  this  problem  is  to  let  it  search  a  state  graph  rather  than  a  slate  tree. 

We  ll  give  the  name  A*  to  the  version  of  A*-tree  that  searches  a  graph.  We  present 
it  next.  A*  differs  from  A*-tree  in  the  following  ways. 

•  A*  exploits  two  sets  of  nodes:  OPEN,  which  functions  as  in  A*-tree  and  contains 
those  nodes  that  have  been  generated  but  not  expanded,  and  CLOSED,  which  con¬ 
tains  those  nodes  that  have  already  been  expanded. 

•  Both  A* -tree  and  A  *  must  be  able  to  trace  backward  from  a  goal  so  that  they  can  return 
the  path  that  they  find.  A*-tree  can  do  that  trivially  by  simply  storing  bi-directional 
pointers  as  it  builds  its  search  tree.  But  A*  searches  a  graph.  So  it  must  explicitly  record, 
at  each  node,  the  best  way  of  getting  to  that  node  from  the  start  node.  Whenever  a  new 
path  to  node  u  is  found,  its  backward  pointer  may  change. 

•  Suppose  that,  in  A*,  a  new  and  cheaper  path  to  node  n  is  found  after  node  n  has 
been  expanded.  Clearly  n' s  g  value  changes.  But  the  cheaper  path  to  n  may  also 
mean  a  cheaper  path  to  n' s  successors.  So  it  may  be  necessary  to  revisit  them  and 
update  their  backward  pointers  and  their  g  values. 

A*(P:  state  space  search  problem)  = 

1.  Start  with  OPEN  containing  only  the  node  corresponding  to  P's  start  state. 
Set  that  node's  g  value  to  0.  its  h  value  to  whatever  it  is,  and  its /value  to 
0  +  h  =  h.  Set  CLOSED  to  the  empty  set. 

2.  Until  an  answer  is  found  or  there  arc  no  nodes  left  in  OPEN  do: 

2.1.  If  there  are  no  nodes  left  in  OPEN,  return  Failure.  There  is  no  path 
from  the  start  state  to  a  goal  state. 

2.2.  Choose  from  OPEN  a  node  such  that  no  other  node  has  a  lower  / 
value.  Call  the  chosen  node  BESTNODE.  Remove  it  from  OPEN. 
Place  it  in  CLOSED. 

23.  If  BESTNODE  is  a  goal  node,  hall  and  return  the  path  from  the  ini¬ 
tial  node  to  BESTNODE. 
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2.4.  Generate  the  successors  of  BESTNODE.  But  do  not  add  them  to  the 
search  graph  until  we  have  checked  to  see  if  any  of  them  correspond  to 
states  that  have  already  been  generated.  For  each  SUCCESSOR  do: 

2.4.1.  Set  SUCCESSOR  to  point  back  to  BESTNODE. 

2.4.2.  Compute  g(SUCCESSOR)  =  g(BESTNODE)  +  the  cost  of 
getting  from  BESTNODE  to  SUCCESSOR. 

2A3.  See  if  SUCCESSOR  corresponds  to  the  same  state  as  any  node  in 
OPEN.  If  so,  call  that  node  OLD.  Since  this  node  already  exists  in 
the  graph,  we  can  throw  SUCCESSOR  away  and  add  OLD  to  the 
list  of  BESTNODE' s  successors.  But  first  we  must  decide  whether 
OLD's  backward  pointer  should  be  reset  to  point  to 
BESTNODE.  It  should  be  if  the  path  we  have  just  found  to 
SUCCESSOR  is  cheaper  than  the  current  best  path  to  OLD  (since 
SUCCESSOR  and  OLD  are  really  the  same  node).  So  compare 
the  g  values  of  OLD  and  SUCCESSOR.  If  OLD  is  cheaper  (or  just 
as  cheap),  then  we  need  do  nothing.  If  SUCCESSOR  is  cheaper, 
then  reset  OLD's  backward  point  to  BESTNODE,  record  the  new 
cheaper  path  in  g(OLD),  and  update  f(OLD). 

2.4.4.  If  SUCCESSOR  was  not  in  OPEN,  see  if  it  is  in  CLOSED.  If  so, 
call  the  node  in  CLOSED  OLD  and  add  OLD  to  the  list  of 
BESTNODE's  successors.  Check  to  see  if  the  new  path  or  the  old 
path  is  better  just  as  in  step  2.4.3,  and  set  the  backward  pointer 
and  g  and  /  values  appropriately.  If  we  have  just  found  a  better 
path  to  OLD,  we  must  propagate  the  improvement  to  OLD's 
successors.  This  is  a  bit  tricky.  OLD  points  to  its  successors.  Each 
successor  in  turn  points  to  its  successors,  and  so  forth,  until  each 
branch  terminates  with  a  node  that  either  is  still  in  OPEN  or  has 
no  successors.  So,  to  propagate  the  new  cost  downward,  do  a 
depth-first  traversal  of  the  search  graph,  starting  at  OLD,  and 
changing  each  node’s  g  value  (and  thus  also  its /value),  terminat¬ 
ing  each  branch  when  it  reaches  either  a  node  with  no  successors 
or  a  node  to  which  an  equivalent  or  better  path  had  already  been 
found.  Note  that  this  condition  doesn’t  just  allow  the  propagation 
to  stop  as  soon  as  the  new  path  ceases  to  make  a  difference  to  any 
further  node’s  cost.  It  also  guarantees  that  the  algorithm  will  ter¬ 
minate  even  if  there  are  cycles  in  the  graph.  If  there  is  a  cycle, 
then  the  second  time  that  a  given  node  is  visited,  the  path  will  be 
no  better  than  the  first  time  and  so  propagation  will  stop. 

2.4.5.  If  SUCCESSOR  was  not  already  in  either  OPEN  or  CLOSED, 
then  compute  its  h  and /values,  put  it  in  OPEN,  and  add  it  to 
the  list  of  BESTNODE' s  successors. 

A*,  like  A*-tree,  is  complete;  it  will  find  a  path  if  one  exists.  If  h  is  admissible,  then 
A*  will  find  a  shortest  path.  And  the  graceful  decay  of  admissibility  principle  applies  to 
A*  just  as  it  does  to  A*-tree. 
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In  addition,  we  can  now  say  something  about  the  efficiency  with  which  A*  finds  a 
shortest  path.  Let  c(rtj,  n2)  be  the  cost  of  getting  from  n,  to  n2.  We'll  say  that  h(n)  is 
monotonic  iff,  whenever  n2  is  a  successor  of  n{  (meaning  that  it  can  be  derived  from  n\ 
in  exactly  one  move),/i(/i))  ^  c(/t|,  fl2)  +  h(n2).  If /is  monotonic,  then  A*  is  optimal 
in  the  sense  that  no  other  search  algorithm  that  uses  the  same  heuristic  function  and 
that  is  guaranteed  to  find  a  cheapest  path  will  do  so  by  examining  fewer  nodes  than  A* 
does.  In  particular,  in  this  case  it  can  be  shown  that  A*  will  never  need  to  reexamine  a 
node  once  it  goes  on  CLOSED.  So  it  is  possible  to  skip  step  2.4.4. 

Unfortunately,  even  with  these  claims.  A*  may  not  be  good  enough.  Depending  on 
the  shape  of  the  state  space  and  the  accuracy  of  h ,  it  may  still  be  necessary  to  examine  a 
number  of  nodes  that  grows  exponentially  in  the  length  of  the  cheapest  path.  However, 
if  the  maximum  error  that  h  may  make  is  small,  the  number  of  nodes  that  must  be  exam¬ 
ined  grows  only  polynomially  in  the  length  of  the  cheapest  path.  More  specifically, 
polynomial  growth  is  assured  if: 

\h*(n)  -  /i(m)|  eO(log/i(/i)). 

A*  is  just  one  member  of  a  large  family  of  heuristic  search  algorithms.  See  [Pearl 
1984]  or  [Russell  and  Norvig  2U02]  for  a  discussion  of  others.  For  example.  A*,  like  its 
cousin,  breadth-first  search,  uses  a  lot  of  space.  There  exist  other  algorithms  use  less. 


Generalized  chess  is  provahly  intractable  (since  it  is  EXPTIME-complete). 
Even  the  standard  chess  board  is  large  enough  that  it  isn't  possible  to  search  a 
complete  game  tree  to  find  a  winning  move.  Yet  champion  chess  programs  exist 
They,  along  with  programs  that  play  other  classic  games  like  checkers  and  Go, 
exploit  a  heuristic  search  algorithm  called  minimax ,  which  we  describe  in  N.2.5. 


Exercises 

1.  In  Exercise  28.23,  we  defined  a  cut  in  a  graph,  the  si/.e  of  a  cut  and  a  bisection.  Let 
G  be  a  graph  with  2v  vertices  and  m  edges.  Describe  a  randomized,  polynomial¬ 
time  algorithm  that,  on  input  G,  outputs  a  cut  of  G  with  expected  size  at  least 
mvl{ 2v-\).  (Hint:  Analyze  the  algorithm  that  takes  a  random  bisection  as  its  cut.) 

2.  Suppose  that  the  A*  algorithm  has  generated  the  following  tree  so  far: 
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Assume  that  the  nodes  were  generated  in  the  order,  A,  B,C,  D,  E.  The  expres¬ 
sion  (g,  h)  associated  with  each  node  gives  the  values  of  the  functions  g  and  h  at 
that  node. 

a.  What  node  will  be  expanded  at  the  next  step? 

b.  Can  it  be  guaranteed  that  A*,  using  the  heuristic  function  h  that  it  is  using,  will 
find  an  optimal  solution?  Why  or  why  not? 

3.  Simple  puzzles  offer  a  way  to  explore  the  behavior  of  search  algorithms  such  as 
A*,  as  well  as  to  experiment  with  a  variety  of  heuristic  functions.  Pick  one  (for 
example  the  15-puzzle  of  Example  4.8  or  see  Q)  and  use  A*  to  solve  it.  Can  you 
find  an  admissible  heuristic  function  that  is  effective  at  pruning  the  search  space? 
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Summary  and  References 


In  Pari  IV.  we  saw  that  some  problems  are  uneomputable  in  principle:  For  example.no 
effort  on  the  pari  of  ihe  engineers  of  the  world  can  make  the  halting  problem  solvable. 
In  Part  V.  we've  considered  only  problems  that  arc  computable  in  principle.  But  we’ve 
seen  that  while  some  arc  computable  in  practice,  others  aren’t,  at  least  with  the  tech¬ 
niques  available  to  us  today.  In  the  years  since  the  theory  of  NP-compIcleness  was  first 
described,  a  substantial  body  of  work  has  increased  our  understanding  of  the  ways  in 
which  some  problems  appear  to  require  more  computational  resources  than  others. 
But  that  work  has  left  many  questions  unanswered.  While  it  is  known  that  not  all  of  the 
complexity  classes  that  we  have  considered  can  collapse,  it  is  unknown  whether  some 
of  the  most  important  of  them  can.  In  particular,  we  are  left  w  ith  the  Millennium  Prob¬ 
lem  Q,  “Does  P  =  NP?" 
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APPENDIX 


A 


Review  of  Mathematical 
Background:  Logic,  Sets,  Relations, 
Functions,  and  Proof  Techniques 


Throughout  this  book,  we  rely  on  a  collection  of  important  mathematical  concepts 
and  notations.  We  summarize  them  here.  For  a  deeper  introduction  to  these  ideas, 
see  any  good  discrete  mathematics  text,  for  example  [Epp  2003]  or  [Rosen  2003], 

A-1  Logic 

We  assume  familiarity  with  the  standard  systems  of  both  Boolean  and  quantified  logic, 
so  this  section  is  just  a  review  of  the  definitions  and  notations  that  we  will  use,  along 
with  some  of  the  most  useful  inference  rules, 

^ /l.l  Boolean  (Propositional)  Logic 

A  proposition  is  a  statement  that  has  a  truth  value.  The  language  of  well-formed 
formulas  ( wffs )  allows  us  to  define  propositions  whose  truth  can  be  determined 
from  the  truth  of  other  propositions.  A  wff  is  any  string  that  is  formed  according  to 
the  following  rules. 

•  A  propositional  symbol  (e.g.,  P)  is  a  wff.  (Propositional  symbols  are  also  called 
variables,  primarily  because  the  term  is  shorter.  We  will  generally  find  it  convenient 
to  do  that,  but  this  use  of  the  term  should  not  be  confused  with  its  use  in  the  defini¬ 
tion  of  first-order  logic.) 

•  If  P  is  a  wff,  then  ->P  is  a  wff. 

•  If  P  and  Q  are  wffs.  then  so  are  P  v  Q.  P  a  £>,  P  -»  Q,  and  P  «-*  Q. 

•  If  l>  is  a  wff.  then  (P)  is  a  wff. 
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Table  A.l 

A  truth  table  for  the  common  Boolean  operators. 

P 

Q 

-.P 

Pr^Q 

PA  Q 

P~*Q 

P~Q 

True 

True 

False 

True 

True 

True 

True 

True 

False 

False 

True 

False 

False 

False 

False 

True 

True 

True 

False 

True 

False 

False 

False 

True 

False 

False 

True 

True 

Other  binary  operators,  such  as  XOR  (exclusive  or)  and  NAND  (not  and). can  also 
be  defined,  but  we  will  not  need  them. 

The  definitions  of  the  operators  are  given  by  the  truth  table  shown  in  Table  A.l.  It 
shows  how  the  truth  value  of  a  proposition  can  be  computed  from  the  truth  values  of 
its  components.  (Note  that  the  symbol  V  means  inclusive  or.) 

We  can  divide  the  set  of  all  Boolean  wffs  into  three  useful  categories,  as  a  function 
of  when  they  are  true: 

•  A  Boolean  wff  is  valid  if  and  only  if  it  is  true  for  all  assignments  of  truth  values  to 
the  variables  it  contains.  A  valid  wff  is  also  called  a  tautology. 

•  A  Boolean  wff  is  satisfiable  if  and  only  if  it  is  true  for  at  least  one  assignment  of 
truth  values  to  the  variables  it  contains. 

•  A  Boolean  wff  is  unsatisfiable  if  and  only  if  it  is  false  for  all  assignments  of  truth 
values  to  the  variables  it  contains. 


EXAMPLE  A.l  Using  a  Truth  Table 

The  wff  P  v  -,P  is  a  tautology  (i.e.,  it  is  valid).  We  can  easily  prove  this  by  extend¬ 
ing  the  truth  table  shown  above  and  considering  the  only  two  possible  cases  ( P  is 
True  or  P  is  False): 


P 

-.P 

P\l-J* 

True 

False 

True 

False 

True 

True 

The  wff  P  V-Q  is  satisfiable.  It  is  True  if  either  P  is  True  or  Q  is  False.  It  is  not  a 
tautology,  however.  The  wff  P  A  -»P  is  unsatisfiable.  It  is  False  both  in  case  P  is 
True  and  in  case  P  is  False. 


We’ll  say  that  two  wffs  P  and  Q  are  equivalent,  which  we  will  write  as  P  e  q  jff 
they  have  the  same  truth  values  regardless  of  the  truth  values  of  the  variables  they  con¬ 
tain.  So. for  example,  (P  — *  (?)  s  (->P  V  Q). 


A.1  Logic  747 


In  interpreting  wffs,  we  assume  that  -«  has  the  highest  precedence,  followed  by  A, 
then  V,  then  -* ,  then  «-♦ .  So: 

(P  V  e  A  /?)  =  (P  V  (Q  A  R)). 

Parentheses  can  be  used  to  force  different  interpretations. 

The  following  properties  (defined  in  Section  A.4.3)  of  the  Boolean  operators  follow 
from  their  definitions  in  the  truth  table  given  above. 

•  The  operators  V,  A.  and  *-*  are  commutative  and  associative. 

•  The  operators  V  and  A  are  idempotent  (e.g.,  (PvP)a  P). 

•  The  operators  V  and  A  distribute  over  each  other. 

•  P  A  (Q  V  R)  5  (P  A  Q)  V  (P  A  R). 

•  P  V  (Q  A  P)  *  (P  V  Q)  A  (P  V  R). 

•  Absorption  laws: 

•  P  A  (P  V  Q)  =  P. 

•  P  V  (P  A  Q)  ■  P. 

•  Double  negation :  -» ->P  *  P. 

•  c/e  Morgan's  Laws : 

•  -(P  A  0)  -  (-P  V  -Q). 

•  ->(P  V  Q)  s  (-»P  A  -»Q). 

We’ll  say  that  a  set  A  of  wffs  logically  implies  or  entails  a  conclusion  Q  iff,  whenev¬ 
er  all  of  the  wffs  in  A  are  true,  Q  is  also  true. 

An  axiom  is  a  wff  that  is  asserted  a  priori  to  be  true.  Given  a  set  of  axioms,  Tules  of 
inference  can  be  applied  to  create  new  wffs,  to  which  the  inference  rules  can  then  be 
applied,  and  so  forth.  Any  statement  so  derived  is  called  a  theorem.  Let  A  be  a  set  of 
axioms  plus  zero  or  more  theorems  that  have  already  been  derived  from  those 
axioms.  Then  a  proof  is  a  finite  sequence  of  applications  of  inference  rules,  starting 
from  A. 

A  proof  is  a  syntactic  object.  It  is  just  a  sequence  of  applications  of  rules.  We  would  like, 
however,  for  proofs  to  tell  us  something  about  truth.  They  can  do  that  if  we  design  our  in¬ 
ference  rules  appropriately.  We'll  say  that  an  inference  rule  is  sound  iff,  whenever  it  is  ap¬ 
plied  to  a  set  A  of  axioms,  any  conclusion  that  it  produces  is  entailed  by  A  (i.e.,  it  must  be 
true  whenever  A  is).  An  entire  proof  is  sound  iff  it  consists  of  a  sequence  of  inference  steps 
each  of  which  was  constructed  using  a  sound  inference  rule.  A  set  of  inference  rules  R  is 
complete  iff,  given  any  set  A  of  axioms,  all  statements  that  are  entailed  by  A  can  be  proved 
by  applying  the  rules  in  R.  If  we  can  define  a  set  of  inference  rules  that  is  both  sound  and 
complete  then  the  set  of  theorems  that  can  be  proved  from  A  will  exactly  correspond  to 
the  set  of  statements  that  must  be  true  whenever  A  is. 

The  truth  table  we  presented  above  is  the  basis  for  the  construction  of  sound  and 
complete  inference  rules  in  Boolean  logic.  Some  useful  rules  are: 

•  Modus  ponens :  From  the  premises  ( P  -*  Q)  and  P,  conclude  Q. 

•  Modus  tollens:  From  the  premises  {P  —  Q)  and  ->Q,  conclude  ->P. 
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•  Or  introduction:  From  the  premise  A*. conclude  (f*  v  (7). 

•  And  introduction:  From  the  premises  P  and  (J.  conclude  ( P  A  (7). 

•  And  elimination:  From  the  premise  (P  A  (7).  conclude  P  or  conclude  Q. 

Any  two  statements  of  the  form  P  and  ~>P  form  a  contradiction. 


A.1.2  First-Order  Logic 

The  primitives  in  Boolean  logic  are  predicates  of  no  arguments  (i.c..  Boolean  con¬ 
stants).  It  is  useful  to  extend  our  logical  system  to  allow  predicates  of  one  or  more  ar¬ 
guments  and  to  allow  the  use  of  variables.  So.  for  example,  we  might  like  to  write 
P(China)  or  Q(x.y).  First-order  logic,  often  called  simply  FOL  (or  sometimes  first- 
order  predicate  logic,  first-order  predicate  calculus,  or  FOPC),  allows  us  to  do  that. 

We  will  use  symbols  that  start  with  lowercase  letters  as  variables  and  symbols  that 
start  with  uppercase  letters  as  constants,  predicates,  and  functions. 

An  expression  that  describes  an  object  is  a  term.  So  a  variable  is  a  term  and  an  /?- ary 
function  whose  arguments  are  terms  is  also  a  term.  Note  that  if  n  is  0.  we  have  a  constant. 

We  define  the  language  of  well-formed  formulas  (nffs)  in  first-order  logic  to  be  the 
set  of  expressions  that  can  be  formed  according  to  the  following  rules. 

•  If  P  is  an  n-ary  predicate  and  each  of  the  expressions  .v,.  .Vi . x„  is  a  term,  then  an 

expression  of  the  form  P(x t.  x2 . x„)  is  a  wlf.  If  any  variable  occurs  in  such  a  wff, 

then  that  variable  is  free  (alternatively,  it  is  not  bound). 

•  If  P  is  a  wff.  then  ->P  is  a  wff. 

•  If  P  and  (7  are  wffs.  then  so  are  P  V  Q.  P  a  Q.  p  — *  Q,  and  P  *-*  (7. 

•  If  Pisa  wff.  then  (P)  is  a  wff. 

•  If  P  is  a  wff.  then  Vx  ( P)  and  3x  ( P)  are  w  ffs.  Any  free  instance  of  x  in  P  is  bound  by 
the  quantifier  and  is  then  no  longer  free.  7  is  called  the  universal  quantifier  and  3  is 
called  the  existential  quantifier.  In  the  wff  Vx  (P)  or  Tv  ( P).  well  call  P  the  scope  of 
the  quantifier.  It  is  important  to  note  that  when  an  existentially  quantified  variable 
y  occurs  inside  the  scope  of  a  universally  quantified  variable  .v  (as.  for  example,  in 
statement  4  below),  the  meaning  of  the  wff  is  that  for  every  value  of  a  there  exists 
some  value  of  y  but  it  need  not  be  the  same  value  of  y  for  every  value  of  a.  So,  for 
example,  the  following  wffs  are  not  equivalent: 

•  V.v  ( 3y  ( Father-ofi  v.  .v ) ) ).  and 

•  3y  ( V.v  ( Fuiher-ttJX  y.  .r ) ) ). 

For  convenience,  we  will  extend  this  syntax  slightly  When  no  confusion  will  result, 
we  will  allow  the  following  additional  forms  for  wffs: 

•  V.v  <  c  (  /’(a))  is  equivalent  to  V.v  (a  <  c  —  P[x)) 

•  V.v  e  S  ( P( a))  is  equivalent  to  V.v  (a  e  .S'  — *  Pi  a)) 

•  Va.  y.  r  (/’(a.  y,  z))  is  equivalent  to  V.v  (Vv  (vr  (/’(.».  v,  c)))) 

•  v.v,  y,  :.eS  (P(x.  y.z))  is  equivalent  to  vx  e  S  (vve  .S'  (Vce.S  ( P(  a.  v.  r)))) 
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The  logical  framework  lhat  we  have  just  defined  is  called  first-  order  because  it  allows 
quantification  over  variables  but  not  over  predicates  or  functions.  It  is  possible  to  define 
higher-order  logics  that  do  permit  such  quantification.  For  example,  in  a  higher-order 
logic  we  might  be  able  to  say  something  like  VP  ( P(John )  — ►  P(Carey)).  In  other 
words,  anything  that  is  true  of  John  is  also  true  of  Carey.  While  it  is  sometimes  useful  to 
be  able  to  make  statements  such  as  this,  the  computational  and  logical  properties  of 
higher-order  systems  make  them  very  hard  to  use  except  in  some  restricted  cases. 

A  wff  with  no  free  variables  is  called  a  sentence  or  a  statement.  All  of  the  following 
are  sentences. 

1.  BeatiSinoky) 

2.  V.v  (Bear(x)  — ►  Animal(x)) 

3.  Vx  (Animal(x)  — *  Bear(x)) 

4.  Vx  ( Animal  (x )  — *  3y  ( Mother-of{y ,  Jr))) 

5.  V.y  ((Animal(x)  A  ->Dead(x))  — *  Alive(x)) 

A  ground  instance  is  a  sentence  that  contains  no  variables.  All  of  the  following  are 
ground  instances:  Bear(Smoky).  Animal{Sinoky),  and  Mother-of{BigEyes .  Smoky).  In 
computational  logic  systems,  it  is  common  to  store  the  ground  instances  in  a  different 
form  than  the  one  that  is  used  for  other  sentences.  They  may  be  contained  in  a  table  or 
a  database,  for  example. 

Returning  to  sentences  1-5  above,  1.2,  and  4.  and  5  are  true  in  our  everyday  world 
(assuming  the  obvious  referent  for  the  constant  Smoky  and  the  obvious  meanings  of 
the  predicates  Bear ,  Animal ,  and  Mother-uf).  On  the  other  hand,  3  is  not  true. 

As  these  examples  show,  determining  whether  or  not  a  sentence  is  true  requires  ap¬ 
peal  to  the  meanings  of  the  constants,  functions,  and  predicates  lhat  it  contains  An 
interpretation  for  a  sentence  w»  is  a  pair  (/>,  I).  D  is  a  universe  of  objects.  1  assigns 
meaning  to  the  symbols  of  w:  It  assigns  values,  drawn  from  D,  to  the  constants  in  w  and 
it  assigns  functions  and  predicates  (whose  domains  and  ranges  are  subsets  of  D)  to  the 
function  and  predicate  symbols  of  w.  A  model  of  a  sentence  w  is  an  interpretation  that 
makes  *t>  true.  For  example,  let  w  be  the  sentence,  Vx  (3.y  (y  <  x)).The  integers  (along 
with  the  usual  meaning  of  <)  are  a  model  of  w  since,  for  any  integer,  there  exists  some 
smaller  integer.The  positive  integers,  on  the  other  hand,  are  an  interpretation  for  w  but 
not  a  model  of  it.  The  sentence  w  is  false  for  the  positive  integers  since  there  is  no  pos¬ 
itive  integer  that  is  smaller  than  1. 

A  sentence  w  is  valid  iff  it  is  true  in  all  interpretations.  In  other  words,  w  is  valid  iff 
it  is  true  regardless  of  what  the  constant,  function,  and  predicate  symbols  “mean”.  A 
sentence  w  is  satisfiable  iff  there  exists  some  interpretation  in  which  w  is  true.  A  sen¬ 
tence  w  is  unsatisfiable  iff  it  is  not  satisfiable  (in  other  words,  there  exists  no  interpre¬ 
tation  in  which  it  is  true).  Any  sentence  ir  is  valid  iff  is  unsatisfiable. 


example  a. 2  Valid,  Satisfiable,  and  Unsatisfiable  wffs 

Let  W\  be  the  wff: 


V.v  ((P(.v)  A  Q(Smoky))  —*  P(x)). 
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The  wff  w>j  is  valid  because  il  is  true  regardless  of  whal  the  predicates  P  and  Q 
are  or  what  object  Smoky  refers  to.  It  is  also  satisfiable  since  it  is  true  in  at  least 
one  interpretation. 

Let  Uh  be  the  wff: 

-(V*  (P(.t)  V  -(/»(; c))). 

The  wff  1^2  is  not  valid.  It  is  also  unsatisfiable  since  it  is  false  in  all  interpreta¬ 
tions,  which  follows  from  the  fact  that  ->tu2  is  valid. 

Finally,  let  w3  be  the  wff: 

Vx(P{x,x)). 

The  wff  wy  is  not  valid  but  it  is  satisfiable.  Suppose  that  the  universe  is  the  inte¬ 
gers  and  P  is  the  predicate  LessThanOrEqualTo.  Then  P  is  true  for  all  values  of  x. 
But,  again  with  the  integers  as  the  universe,  suppose  that  P  is  the  predicate 
LessThcin.  Now  P  is  false  for  all  values  of  x.  Finally,  let  the  universe  be  the  set  of  all 
people  and  let  P  be  the  predicate  HasConfidencelnTheAhilityOf.  Now  P  is  true  of 
some  values  of  x  (i.e.,  of  those  people  who  have  self  confidence)  and  false  of  others. 


A  set  A  of  axioms  logically  Implies  or  entails  a  conclusion  c  iff,  in  every  interpreta¬ 
tion  in  which  A  is  true  (i.e.,  in  every  model  of  A),  and  for  all  assignments  of  values  to 
the  free  variables  of  c.c  must  be  true. 

As  in  Boolean  logic,  a  proof  in  first-order  logic  starts  with  a  set  A  of  axioms  and  the¬ 
orems  that  have  already  been  proved  from  those  axioms.  Rules  of  inference  are  then 
applied,  creating  new  statements.  Any  statement  derived  in  this  way  is  called  a 
theorem.  A  proof  is  a  finite  sequence  of  applications  of  inference  rules,  starting  from 
the  axioms  and  given  theorems. 

As  in  Boolean  logic,  we  will  say  that  an  inference  rule  is  sound  iff,  whenever  it  is  ap¬ 
plied  to  a  set  A  of  statements  (axioms  and  given  theorems),  any  conclusion  that  it  pro¬ 
duces  is  entailed  by  A  (i.e.,  it  must  be  true  whenever  A  is).  A  set  of  inference  rules  R  is 
complete  iff,  given  any  set  A  of  statements,  all  statements  that  are  entailed  by  A  can  be 
proved  by  applying  the  rules  in  R.  As  in  Boolean  logic,  we  seek  a  set  of  inference  rules 
that  is  both  sound  and  complete. 


Resolution  is  a  single  inference  rule  that  is  used  as  the  basis  for  many  automat¬ 
ic  theorem  proving  and  reasoning  programs.  It  is  sound  and  refutation-com¬ 
plete.  By  the  latter,  we  mean  that  if  ~>ST  is  inconsistent  with  the  axioms  and  if 
both  the  axioms  and  -'ST  have  been  converted  to  a  restricted  syntax  called 
clause  form,  resolution  will  find  the  inconsistency  and  thus  prove  ST.  (B.2.2) 


For  Boolean  logic,  truth  tables  provide  a  basis  for  defining  a  set  of  sound  and  com¬ 
plete  inference  rules.  It  is  less  obvious  that  such  a  set  exists  for  first-order  logic.  But  it 
does,  as  was  first  shown  by  Kurt  Godel  in  his  Completeness  Theorem  [Godel  1929], 
More  specifically,  Gddel  showed  that  there  exists  some  set  of  inference  rules  R  such 
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that,  given  any  set  of  axioms  A  and  a  sentence  c,  there  is  a  proof  of  c,  starting  with  A 
and  applying  the  rules  in  R,  iff  c  is  entailed  by  A.  Note  that  all  that  we  are  claiming  here 
is  that,  if  there  is  a  proof,  there  is  a  procedure  for  finding  it.  We  are  not  claiming  that 
there  exists  a  procedure  that  decides  whether  or  not  a  proof  exists.  In  fact,  as  we  show 
in  Section  22.4.2,  for  first-order  logic  no  such  decision  procedure  can  exist. 

All  of  the  inference  rules  that  we  have  and  will  present  are  sound.  The  individual  in¬ 
ference  rules  that  we  have  so  far  considered  are  not,  however,  complete.  For  example, 
modus  ponens  is  incomplete.  But  a  complete  procedure  can  be  constructed  by  including 
all  of  the  rules  we  listed  above  for  Boolean  logic,  plus  new  ones,  including,  among  others: 

•  Quantifier  exchange. 

•  From  ->3x(P),  conclude  Vx  (->P). 

•  From  V.r  (->P),  conclude  ->3x  (P). 

•  From  -V.r  (P),  conclude  3x  (~>P). 

•  From  3x  ( ->P ),  conclude  ->Vx  (P). 

•  Universal  instantiation:  For  any  constant  C,  from  Vx  (P( jc)),  conclude  P(C). 

•  Existential  generalization :  For  any  constant  C,  from  P(C),  conclude  Bjc  (P(jc)). 


EXAMPLE  A.3 

A  Simple  Proof 

Assume  the  following  three  axioms: 

[1] 

Vx  (P(jc)  A  Q(x)  -*  R(x)). 

[2] 

P(X}). 

[3] 

We  prove  P(.Yi)  as  follows: 

14] 

WAQ(Jfi)-*  *(*,). 

(Universal  instantiation,  [1].) 

15] 

P(Xx)  A  Q{X{). 

(And  introduction,  [2],  [3].) 

[6] 

R(X  i). 

(Modus  ponens,  [5],  [4].) 

A  first-order  theory  is  a  set  of  axioms  and  the  set  of  all  theorems  that  can  be  proved, 
using  a  set  of  sound  and  complete  inference  rules,  from  those  axioms.  A  theory  is 
logically  complete  iff,  for  every  sentence  P  in  the  language  of  the  theory,  either  P  or 
-iP  is  a  theorem.  A  theory  is  consistent  iff  there  is  no  sentence  P  such  that  both  P  and 
-iP  are  theorems.  If,  on  the  other  hand,  there  is  such  a  sentence,  then  the  theory  con¬ 
tains  a  contradiction  and  is  inconsistent. 

We  are  often  interested  in  the  relationship  between  a  theory  and  some  set  of  facts 
that  are  true  in  some  view  we  may  have  of  the  world  (for  example  the  facts  of  arith¬ 
metic  or  the  facts  a  robot  needs  in  order  to  move  around).  Let  w  be  a  world  plus  an  in¬ 
terpretation  (that  maps  logical  objects  to  objects  in  the  world).  Now  we  can  say  that  a 
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theory  is  sound  with  respect  to  w  iff  every  theorem  (in  the  theory)  corresponds  to  a 
fact  that  is  true  (in  tt»)- We  say  that  a  theory  is  complete  with  respect  to  u*  iff  every  fact 
that  is  true  (in  w)  corresponds  to  a  theorem  (in  the  theory).  We  will  assume  that  any 
first-order  logic  statement  in  the  language  of  w  is  either  true  or  false  in  the  world  that 
to  describes.  So.  if  a  theory  is  complete  with  respect  to  w  it  must  be  the  case  that,  for 
any  sentence  P,  either  P  corresponds  to  a  sentence  that  is  true  in  u\  in  which  case  it  is 
a  theorem,  or  P  corresponds  to  a  sentence  that  is  false  in  w.  in  which  case  ->P  is  a  theo¬ 
rem.  So  any  theory  that  is  complete  with  respect  to  an  interpretation  and  a  set  of  facts 
is  also  logically  complete. 

By  the  way,  while  the  language  of  first-order  logic  has  the  property  that  every  state¬ 
ment  is  either  true  or  false  in  any  world,  not  all  languages  share  that  property.  For  ex¬ 
ample.  English  doesn't.  Consider  the  English  sentence,  “The  king  of  France  has  red 
hair.”  Is  it  true  or  false  (in  the  world  as  we  know  it.  given  the  standard  meanings  of  the 
words)?  The  answer  is  neither.  It  carries  the  (false)  presupposition  that  there  is  a  king 
of  France  and  then  makes  a  claim  about  that  individual.  This  problem  disappears,  how¬ 
ever,  when  we  convert  the  English  sentence  into  a  related  sentence  in  first  order  logic. 
We  might  try: 

•  3x  (King-of(x,  France)  A  Haircolor-of(x,  Red)):  This  sentence  is  false  in  the 

world. 

•  Vx  (King-of(x,  France)— *  Haircolor-of(x.  Red))-.  This  sentence  is  true  in  the 

world  (trivially,  since  there  are  no  values  of  .v  for  which  King-oJ{x,  France)  is  true). 

There  are  interesting  first-order  theories  that  are  both  consistent  and  complete  with 
respect  to  particular  interpretations  of  interest.  One  example  is  Presburger  arithmetic, 
in  which  the  universe  is  the  natural  numbers  and  there  is  a  single  function,  plus,  whose 
properties  are  axiomatized.  There  are  other  theories  that  are  incomplete  because  we 
have  not  yet  added  enough  axioms.  But  it  might  be  possible,  eventually,  to  find  a  set  of 
axioms  that  does  the  job. 

However,  many  interesting  and  powerful  theories  are  not  both  consistent  and  com¬ 
plete  and  they  will  never  become  so.  For  example.  Gtidel's  Incompleteness  Theorem 
[Gcidel  1931]  S.one  of  the  most  important  results  in  modern  mathematics,  shows  that 
any  theory  that  is  derived  from  a  decidable  (a  notion  that  we  explain  in  Chapter  17)  set 
of  axioms  and  that  characterizes  the  standard  behavior  of  the  constants  0  and  1,  plus 
the  functions  plus  and  times  on  the  natural  numbers,  cannot  be  both  consistent  and 
complete.  In  other  words,  if  any  such  theory  is  consistent  (and  it  is  generally  assumed 
that  the  standard  theory  of  arithmetic  is),  then  there  must  be  some  statements  that  are 
true  (in  arithmetic)  but  not  provable  (in  the  theory).  While  it  is  of  course  possible  to 
add  new  axioms  and  thus  make  more  statements  provable,  there  will  always  remain 
some  true  but  unprovablc  statements  unless  either  the  set  of  axioms  becomes  inconsis¬ 
tent  or  it  becomes  infinite  and  undecidable.  In  the  latter  case,  the  fact  that  a  proof  ex¬ 
ists  is  not  very  useful  since  it  has  become  impossible  to  tell  whether  or  not  a  statement 
is  an  axiom  and  thus  can  be  used  in  a  proof. 

Do  not  be  confused  by  the  fact  that  there  exists  both  a  Completeness  Theorem  and 
an  Incompleteness  Theorem.  The  terminology  is  unfortunate  since  it  is  based  on  two 
different  notions  of  completeness.  The  Completeness  Theorem  states  a  fact  about  the 
framework  of  first-order  logic  itself.  It  says  that  there  exists  a  set  of  inference  rules 
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(and,  in  fact,  more  than  one  such  set  happens  to  exist)  such  that,  given  any  set  A  of  ax¬ 
ioms,  the  theorems  that  are  provable  from  A  are  exactly  the  set  of  sentences  that  are 
entailed  by  A.  The  Incompleteness  Theorem  states  a  fact  about  theories  that  can  be 
built  within  any  logical  framework.  It  says  that  there  exist  theories  (the  standard  one 
about  arithmetic  with  plus  and  times  being  one  example)  that,  assuming  consistency, 
are  incomplete  in  the  sense  that  there  are  sentences  that  are  true  in  the  world  but  that 
are  not  theorems.  Such  theories  are  also  logically  incomplete:  There  exist  sentences  P 
such  that  neither  P  nor  -<P  is  a  theorem. 


A.2  Sets 

Most  of  the  structures  that  we  will  consider  are  based  on  the  fundamental  notion  of  a  set. 

A.2.1  What  is  a  Set? 

A  set  is  simply  a  collection  of  objects.  The  objects  (which  we  call  the  elements  or 
members  of  the  set)  can  be  anything:  numbers,  people,  strings,  fruits,  etc.  For  example, 
all  of  the  following  are  sets. 

.  S,  =  {13,11,8,23} 

.  S2=  {8,23,11,13} 

•  53  =  {8,8,23,23,11,11,13,13} 

•  S4  =  {apple,  pear,  banana,  grape} 

•  S5  =  {January,  February,  March,  April,  May,  June,  July,  August, 

September,  October,  November,  December} 

•  Sh  =  and  x  has  31  days} 

•  S7  =  {January,  March,  May,  July,  August,  October,  December} 

•  fol  =  the  nonnegative  integers  (also  called  the  natural  numbers) 

•  S8  =  {i'.zxeN  (1  =  2jc)} 

.  Si)  =  {0,2, 4, 6,  8, ... } 

•  S,o  =  lhe  even  natural  numbers 

•  Sn  =  the  syntactically  valid  C  programs 

•  5)2  =  {x  :  xe  Su  and x  never  gets  into  an  infinite  loop} 

•  5,3  =  {finite  length  strings  of  a’s  and  b’s} 

•  Z  =  the  integers (...-3, -2, -1,0, 1,2, 3,...) 

In  the  definitions  of  S6,  Sg,  and  S,2,  we  have  used  the  colon  notation.  Read  it  as  “such 
that.  So,  for  example,  read  the  definition  of  S6  as,  “the  set  of  all  values  x  such  that  x  is  an 
element  of  Sj  and  x  has  31  days.’  We  have  used  the  standard  symbol  e  for  “element 
of“  We  will  also  use  *  for  “not  an  element  of”  So,  for  example,  17  *  5,  is  true. 

Remem  er  t  at  a  set  is  simply  a  collection  of  elements.  So  if  two  sets  contain  pre¬ 
cisely  the  same  elements  (regardless  of  the  way  we  actually  defined  the  sets),  then  they 
are  identical. Thus  Sb  and  S7  are  the  same  set,  as  are  Sg,  S9,  and  S,0. 
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Since  a  set  is  defined  only  by  what  elements  it  contains,  the  order  in  which  we  list  its 
elements  does  not  matter. Thus  S,  and  S2  are  the  same  set.  Also  note  that  a  given  element 
is  either  in  a  set  or  it  isn’t.  Duplicates  do  not  matter.  So  sets  St,  S2.  and  Sy  are  equal. 

One  useful  technique  for  describing  a  set  S  that  is  a  subset  of  an  existing  set  D  is  to 
define  a  function  (we’ll  define  formally  what  we  mean  by  a  function  in  Section  A.4) 
that  can  be  used  to  determine  whether  or  not  a  given  element  is  in  S.  Such  a  function  is 
called  a  characteristic  function.  Formally,  a  function  /  with  domain  D  is  a  characteristic 
function  for  a  set  S  iff  /(.*)  =  True  if  x  is  an  clement  of  S  and  False  otherwise.  For  ex¬ 
ample,  we  used  this  technique  to  define  set  Sh. 

We  can  use  programs  to  define  sets.  There  are  two  ways  to  use  a  program  to  define 
a  set  S: 

•  Write  a  program  that  generates  the  elements  of  S.  We  call  the  output  of  such  a  pro¬ 
gram  an  enumeration  of  5. 

•  Write  a  program  that  decides  S  by  implementing  the  characteristic  function  of  S. 
Such  a  program  returns  True  if  run  on  some  clement  that  is  in  S  and  False  if  run  on 
an  element  that  is  not  in  S. 

It  seems  natural  to  ask.  given  some  set  S,  “What  is  the  size  of  S'?"  or  “How  many 
elements  does  S  contain?”  We  will  use  the  term  cardinality  to  describe  the  way  we 
answer  such  questions.  So  we'll  reply  that  the  cardinality  of  S.  written  |S|,  is  «,for  some 
appropriate  value  of  n.  For  example.  |  {2, 7. 1 1 }  |  =  3.  In  simple  cases,  determining  the 
cardinality  of  a  set  is  straightforward.  In  other  cases,  it  is  more  complicated.  For  our 
purposes,  however,  we  can  get  by  with  three  different  kinds  of  answers: 

•  a  natural  number  (if  S  is  finite), 

•  “countably  infinite’’  (if  S  has  the  same  number  of  elements  as  there  are  integers),  or 

•  “uncountably  infinite”  (if  S  has  more  elements  than  there  are  integers). 

We  will  formalize  these  ideas  in  Section  A.6.8. 

The  smallest  set  is  the  unique  set  that  contains  no  elements.  It  is  called  the  empty  set , 
and  is  written  0  or  {  }.  The  cardinality  of  the  empty  set,  written  |0|,  is  0. 

When  we  are  working  with  sets,  it  is  very  important  to  keep  in  mind  the  difference 
between  a  set  and  the  elements  of  a  set.  Given  a  set  that  contains  more  than  one  ele¬ 
ment.  this  distinction  is  usually  obvious.  It  is  clear  that  {1,2}  is  distinct  from  either  the 
number  1  or  the  number  2.  It  sometimes  becomes  a  bit  less  obvious,  though,  with 
singleton  sets  (sets  that  contain  only  a  single  element).  But  it  is  equally  true  for  them. 
So,  for  example.  { 1 }  is  distinct  from  the  number  1.  As  another  example,  consider  {0}. 
This  is  a  set  that  contains  one  element.  That  element  is  in  turn  a  set  that  contains  no  el¬ 
ements  (i.e.,  the  empty  set).  { { 1. 2. 3} }  is  also  a  set  that  contains  one  clement. 

A.2.2  Relating  Sets  to  Each  Other 

We  say  that  A  is  a  subset  of  B  (which  we  write  as  A  C  B)  iff  every  element  of  A  is  also 
an  element  of  B.  Formally,  A  C  B  iff  V.t  e  A  (xe  B). 

The  symbol  we  use  for  subset  (  C  )  looks  somewhat  like  s.  This  is  no  accident.  If 
AC  B,  then  there  is  a  sense  in  which  the  set  A  is  "less  than  or  equal  to”  the  set  B,  since 
all  the  elements  of  A  must  be  in  B,  but  there  may  be  elements  of  B  that  are  not  in  A. 
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Given  this  definition,  notice  that  every  set  is  a  subset  of  itself.  This  fact  turns  out  to 
offer  a  useful  way  to  prove  that  two  sets  A  and  B  are  equal:  First  prove  that  A  is  a  sub¬ 
set  of  fl.Then  prove  that  B  is  a  subset  of  A.  We  will  have  more  to  say  about  this  later  in 
Section  A.6.7. 

We  say  that  A  is  proper  subset  of  B  (written  A  C  B)  iff  A  C  B  and  A  B.  The  Venn 
diagram  shown  in  Figure  A.2(a)  illustrates  the  proper  subset  relationship  between  A 
and  B.  Notice  that  the  empty  set  is  a  subset  of  every  set  (since,  trivially,  every  element 
of  0,  all  none  of  them,  is  also  an  element  of  every  other  set).  And  the  empty  set  is  a 
proper  subset  of  every  set  other  than  itself. 

It  is  useful  to  define  some  basic  operations  that  can  be  performed  on  sets. 

•  The  union  of  two  sets  A  and  B  (written  A  U  B)  contains  all  elements  that  are  con¬ 
tained  in  A  or  B  (or  both).  Formally,  A  U  B  =  {x:  (x  e  A)  V  (x  e  B)}.  We  can  easi¬ 
ly  visualize  union  using  a  Venn  diagram,  as  shown  in  Figure  A.l(b).The  union  of 
sets  A  and  B  is  the  entire  hatched  area  in  the  diagram. 

•  The  intersection  of  two  sets  A  and  B  (written  A  H  B)  contains  all  elements  that  are 
contained  in  both  A  and  B.  Formally,  AC\  B  =  {x:  (x  e  A)  A  (x  e  B)}.  In  the  Venn 
diagram  shown  in  Figure  A.l(b),  the  intersection  of  A  and  B  is  the  double  hatched 
area  in  the  middle. 

•  The  difference  of  two  sets  A  and  B  (written  A  -  Bor  AIB)  contains  all  elements  that 
are  contained  in  A  but  not  in  B.  Formally,  AIB  =  {x:  (x  e  A)  A  (x  e  B)}.  In  the  Venn 
diagrams  shown  in  Figure  A.l(c)  and  (d),  the  hatched  region  represents  AIB. 


FIGURE  A.1  Venn  diagrams 
that  illustrate  relations  and 
functions  on  sets. 


756  Appendix  A  Review  of  Mathematical  Background 


•  The  complement  of  a  set  A  with  respect  to  a  specific  universe  U  (written  as  -v4)  con¬ 
tains  exactly  those  elements  of  U  that  are  not  contained  in  A  (i.e.,  ->A  =  U  -  A). 
Formally,  ->A  =  {jc  {x e(/)  A  (.tg  A)}.  For  example,  if  U  is  the  set  of  residents  of 
Austin  and  A  is  the  set  of  Austin  residents  who  like  barheque,  then  ->A  is  the  set  of 
Austin  residents  who  don’t  like  barbeque.  The  complement  of  A  is  shown  as  the 
hatched  region  of  the  Venn  diagram  shown  in  Figure  A.  1(e). 

•  Two  sets  are  disjoint  iff  they  have  no  elements  in  common  (i.e.,  their  intersection  is 
empty).  Formally,  A  and  B  are  disjoint  iff  A  H  B  =  0.  In  the  Venn  diagram  shown 
in  Figure  A.  1(f).  A  and  B  are  disjoint. 

Given  a  set  A ,  we  can  consider  the  set  of  all  subsets  of  A.  We  call  this  set  the  power 
set  of  A,  and  we  write  it  3>(  A).  For  example,  let  A  =  { 1. 2. 3).  Then: 

9(A)  =  {0.{1},  {2),  {3}.  {1.2}.  {1,3}.  {2.3}.  {1.2.3}}. 

The  power  set  of  A  is  interesting  because,  if  we're  working  with  the  elements  of  A, 
we  may  well  care  about  all  the  ways  in  which  we  can  combine  those  elements. 

Now  for  one  final  property  of  sets.  Again  consider  the  set  A  above.  But  this  time, 
rather  than  looking  for  all  possible  subsets,  let’s  just  look  for  a  single  way  to  carve  A  up 
into  subsets  such  that  each  element  of  A  is  in  precisely  one  subset.  For  example,  we 
might  choose  any  of  the  following  sets  of  subsets: 

{{1}.  {2.3}}  or  {{1.3).  (2||  or  {{1.2.3}). 

We  call  any  such  set  of  subsets  a  partition  of  A.  Partitions  are  very  useful.  For  exam¬ 
ple.  suppose  we  have  a  set  S  of  students  in  a  school.  We  need  for  every  student  to  be  as¬ 
signed  to  precisely  one  lunch  period.  Thus  we  must  construct  a  partition  of  S:  a  set  of 
subsets,  one  for  each  lunch  period,  such  that  each  student  is  in  precisely  one  subset. 
More  formally,  we  say  that  n  £  i/*(  A)  is  a  partition  of  a  set  A  iff: 

•  no  element  of  FT  is  empty, 

•  all  pairs  of  elements  of  n  arc  disjoint  (alternatively,  each  element  of  A  is  in  at  most 
one  element  of  ri).  and 

•  the  union  of  all  the  elements  of  FI  equals  A  (alternatively,  each  element  of  A  is  in 
some  element  of  FI  and  no  element  not  in  A  is  in  any  element  of  I"I). 

This  notion  of  partitioning  a  set  is  fundamental  to  programming.  Every  time  we  an¬ 
alyze  the  set  of  possible  inputs  to  a  program  and  consider  the  various  cases  that  must 
be  dealt  with,  we  are  forming  a  partition  of  the  set  of  inputs:  Each  input  must  fall 
through  precisely  one  path  in  your  program.  So  it  should  come  as  no  surprise  that,  as 
we  build  formal  models  of  computational  devices,  we  will  rely  heavily  on  the  idea  of  a 
partition  of  a  set  of  inputs  as  an  analytical  technique. 


A.3  Relations 

In  the  last  section,  we  introduced  some  simple  relations  that  can  hold  between  sets 
(subset  and  proper  subset)  and  we  defined  some  operations  (functions)  on  sets  (union, 
intersection,  difference,  and  complement).  But  we  haven’t  yet  defined  formally  what 
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we  mean  by  a  relation  or  a  function.  We  will  do  that  now.  (By  the  way,  the  reason  we 
introduced  relations  and  functions  on  sets  in  the  last  section  is  that  we  are  going  to 
use  sets  as  the  basis  for  our  formal  definitions  of  relations  and  functions  and  we  will 
need  the  simple  operations  we  just  described  as  part  of  our  definitions.) 


A.3.1  What  is  a  Relation? 

An  ordered  pair  is  a  sequence  of  two  objects.  Given  any  two  objects,  x  and  y,  there  are 
two  ordered  pairs  that  can  be  formed.  We  write  them  as  (x,  y)  and  (y,  x).  As  the  name 
implies,  in  an  ordered  pair  (as  opposed  to  in  a  set),  order  matters  (unless  x  and  y  hap¬ 
pen  to  be  equal). 

The  Cartesian  product  of  two  sets  A  and  B  (written  A  X  B)  is  the  set  of  all  or¬ 
dered  pairs  (a,  b)  such  that  a  e  A  and  beB.  For  example,  let  A  be  a  set  of  people: 
{Dave,  Sara,  Billy},  and  let  B  be  a  set  of  desserts:  {cake,  pie,  ice  cream}.  Then: 

A  X  B  =  {  (Dave,  cake),  (Dave,  pie),  (Dave,  ice  cream), 

(Sara,  cake),  (Sara,  pie),  (Sara,  ice  cream), 

(Billy,  cake),  (Billy,  pie),  (Billy,  ice  cream)}. 

As  you  can  see  from  this  example,  the  Cartesian  product  of  two  sets  contains  ele¬ 
ments  that  represent  all  the  ways  of  pairing  some  element  from  the  first  set  with  some 
element  from  the  second.  Note  that  A  x  £  is  not  the  same  as  B  X  A.  In  our  example: 

B  X  A  =  {  (cake,  Dave),  (pie,  Dave),  (ice  cream,  Dave), 

(cake,  Sara),  (pie,  Sara),  (ice  cream,  Sara), 

(cake,  Billy),  (pie,  Billy),  (ice  cream,  Billy)}. 

If  A  and  B  are  finite,  then  the  cardinality  of  their  Cartesian  product  is  given  by: 

\AXB\  = 

A  binary  relation  over  two  sets  A  and  B  is  a  subset  of  AX  B.  For  example,  let’s  con¬ 
sider  the  problem  of  choosing  dessert.  We  could  define  a  relation  that  tells  us,  for  each 
person,  what  desserts  he  or  she  likes.  We  might  write  the  Dessert  relation,  for  example  as: 

Dessert  =  {(Dave,  cake),  (Dave,  ice  cream),  (Sara,  pie),  (Sara,  ice  cream)}. 

In  other  words.  Dave  likes  cake  and  ice  cream,  Sara  likes  pie  and  ice  cream,  and  Billy 
seems  not  to  like  sinful  treats. 

Not  all  relations  are  binary.  We  define  an  n-ary  relation  over  sets  Aj,  A 2, . . .  An  to 
be  a  subset  of  A  \  x  A2  x  ...  x  An.  The  n  sets  may  be  different,  or  they  may  be  the 
same.  For  example,  let  A  be  a  set  of  people: 

A  =  {Dave,  Sara,  Billy,  Beth,  Mark,  Cathy,  Pete}. 

Now  suppose  that  Sara  and  Dave  are  the  parents  of  Billy,  Beth  and  Mark  are  the 
parents  of  Cathy,  and  Billy  and  Cathy  are  the  parents  of  Pete.  Then  we  could  define 
a  3-ary  (or  ternary)  relation  Child-of ,  where  the  first  element  of  each  3-tuple  is  the 
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mother,  the  second  is  the  father,  and  the  third  is  the  child.  So  we  would  have  the  fol¬ 
lowing  subset  of  A  X  A  x  A: 

{(Sara.  Dave.  Billy).  (Beth.  Mark. Cathy).  (Cathy.  Billy.  Pete)}. 

Notice  a  couple  of  important  properties  of  relations  as  we  have  defined  them.  First, 
a  relation  may  be  equal  to  the  empty  set.  For  example,  if  Dave.  Sue.  and  Billy  all  hate 
dessert,  then  the  Dessert  relation  would  be  {  }  or  0. 

Second,  there  are  no  constraints  on  how  many  times  a  particular  element  may  occur  in 
a  relation.  In  the  Dessert  example.  Dave  occurs  twice.  Sue  occurs  twice.  Billy  doesn’t 
occur  at  all.  cake  occurs  once,  pie  occurs  once,  and  ice  cream  occurs  twice.  Given  an  /r-ary 

relation  ft.  we  ll  use  the  notation  ft(.v, . .r(J  i )  for  the  set  that  contains  those  elements 

with  the  properly  that  (.v, . xn . ,,  x„)  e  R.  So,  for  example  Dc\,vcr/(Dave) 

=  {cake,  ice  cream}. 

An  /j-ary  relation  R  is  a  subset  of  the  cross  product  of  n  sets. The  sets  may  all  be  dif¬ 
ferent,  or  some  of  them  may  be  the  same.  In  the  specific  case  in  which  all  the  sets  are 
the  same,  we  will  say  that  R  is  a  relation  on  the  set  A. 

Binary  relations  are  particularly  useful  and  are  often  written  in  the  form  .\  |  R  Com¬ 
mon  binary  relations  include  =  (equality,  defined  on  many  domains).  <  (defined  on 
numbers  and  some  other  domains),  and  <  (also  defined  on  numbers  and  some  other 
domains).  For  example,  the  relation  <  on  the  integers  contains  an  infinite  number  of 
elements  drawn  from  the  Cartesian  product  of  the  set  of  integers  with  itself.  For 
instance.  2  <  7. 

The  inverse  of  a  binary  relation  R.  written  R  \  is  simply  the  set  of  ordered  pairs 
in  R  with  the  elements  of  each  pair  reversed.  Formally,  if  RCA  X  B,  then 
R  1  C  B  X  A  =  {(b. «):  (a,  h)eR).  If  a  relation  is  a  wav  of  associating  with  each  ele¬ 
ment  of  A  with  a  corresponding  element  of  B.  then  think  of  its  inverse  as  a  way  of  as¬ 
sociating  with  elements  of  B  their  corresponding  elements  in  A.  Every  relation  has  an 
inverse.  For  example,  the  inverse  of  <  (in  the  usual  sense,  defined  on  numbers)  is  >. 

If  we  have  two  or  more  binary  relations,  we  may  be  able  combine  them  via  an  oper¬ 
ation  we’ll  call  composition.  For  example,  if  we  knew  the  number  of  fat  grams  in  a  serv¬ 
ing  of  each  kind  of  dessert,  we  could  ask  for  the  number  of  fat  grams  in  a  particular 
person’s  dessert  choices. To  compute  this,  we  first  use  the  Dessert  relation  to  find  all  the 
desserts  each  person  likes.  Next  we  get  the  bad  news  from  the  Fatf>rams  relation,  which 
probably  looks  something  like  this: 

{(cake.  30).  (pie.  25).  (ice  cream.  15)}. 

Finally,  we  see  that  the  composed  relation  that  relates  people  to  fat  grams  is  { (Dave,  30), 
(Dave.  15),  (Sara,  25),  (Sara,  15) } .  Of  course,  this  only  worked  because,  when  we  applied 
the  first  relation,  we  got  back  desserts. Then  our  second  relation  has  desserts  as  its  first  com¬ 
ponent.  We  couldn't  have  composed  Dessert  with  Less  titan,  for  example. 

Formally,  we  say  that  the  composition  of  two  relations  ft,  C  A  X  B  and  ft,  C  fl  x  C, 
written  ft, "  ft,,  is: 

ft;  °  ft|  =  {(«.c) :  3b  ((</.  h)B  ft,  A  ((/>.  c)e  ft,)}. 
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Note  that  this  definition  tells  us  that,  to  compute  R2  0  R\,  we  first  apply  R\,  then  R2- 
In  other  words  we  go  right  to  left.  Some  definitions  go  the  other  way.  Obviously  we  can 
define  it  either  way.  but  it  is  important  to  be  consistent.  Using  the  notation  we  have  just 
defined,  we  can  represent  the  people  to  fat  grams  composition  described  above  as 
Fatgrams  "  Dessert. 


A.3.2  Representing  Binary  Relations 

Binary  relations  are  particularly  important.  If  we’re  going  to  work  with  them,  and,  in 
particular,  if  we  are  going  to  compute  with  them,  we  need  some  way  to  represent  them. 
We  have  several  choices.  To  represent  some  binary  relation  R,  we  could: 

1.  List  the  elements  of  R.  For  example,  consider  the  Mother-of  relation  in  a  family  in 
which  Doreen  is  the  mother  of  Ann,  Ann  is  the  mother  of  Catherine,  and  Cather¬ 
ine  is  the  mother  of  Allison.  Then  we  can  write: 

Mother-of  =  {(Doreen,  Ann),  (Ann,  Catherine),  (Catherine,  Allison)}. 

Clearly,  this  approach  only  works  for  finite  relations. 

2.  Encode  R  as  a  computational  procedure.  As  with  any  set,  there  are  at  least  two 
ways  in  which  a  computational  procedure  can  define  R.  It  may: 

•  enumerate  the  elements  of  /?,  or 

•  implement  the  characteristic  function  for  R  by  returning  True  when  given  a 
pair  that  is  in  R  and  Folse  when  given  anything  else. 

3.  Encode  R  as  an  adjacency  matrix.  Assuming  a  finite  relation  RCA  X  B,  we  can 
build  an  adjacency  matrix  to  represent  R  as  follows: 

•  Construct  a  Boolean  matrix  M  (i.e.,  a  matrix  all  of  whose  values  are  True  or 
False)  with  |/l|  rows  and  |fl|  columns. 

•  Label  each  row  for  one  element  of  A  and  each  column  for  one  element  of  B. 

•  For  each  element  ( p .  q)  of  /?,  set  Af  [p.  q]  to  True.  Set  all  other  elements  of  M 
to  False. 

If  we  let  1  represent  True  and  blank  represent  False ,  the  adjacency  matrix  shown 
in  Table  A.2  represents  the  relation  Mother-of  defined  above. 


Table  A.2  Representing  a  relation  as  an  adjacency  matrix.  1 

Doreen 

Ann 

Catherine 

Allison 

Doreen 

Ann 

Catherine 

Allison 

1 

1 

1 
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4.  Encode  R  as  a  directed  graph.  If  R  is  a  relation  on  the  set  A.  we  can  huild  a  direct¬ 
ed  graph  to  represent  R  as  follows: 

•  Construct  a  set  of  vertices  (often  called  nodes),  one  for  each  element  of  A 
that  appears  in  any  element  of  R. 

•  For  each  ordered  pair  in  R.  draw  an  edge  from  the  first  element  of  the  pair  to 
the  second. 

The  directed  graph  shown  in  Figure  A.2(a)  represents  our  example  relation  Moiher- 
o/defined  above.  If  there  are  two  elements  x  and  y.  and  both  U.y)  and  (y,.v)  are  in  R , 
we  will  usually  draw  the  graph  as  shown  in  Figure  A.2(b). The  directed  graph  technique 
can  also  be  used  if  R  is  a  relation  over  two  different  sets  A  and  ft.  But  in  this  case,  we 
must  construct  vertices  for  elements  of  A  and  for  elements  of  ft.  So.  for  example,  we 
could  represent  a  Fat^rums  relation  as  shown  in  Figure  A.2(e). 


A.3.3  Properties  of  Binary  Relations  on  Sets 

Many  useful  binary  relations  have  some  kind  of  structure.  For  example,  it  might  be 
the  case  that  every  element  of  the  underlying  set  is  related  to  itself.  Or  it  might  hap¬ 
pen  that  if  x  is  related  to  y.  then  v  must  necessarily  be  related  to  x.  There  is  one  spe¬ 
cial  kind  of  relation,  called  an  equivalence  relation  that  is  particularly  useful.  But 
before  we  can  define  it.  we  need  first  to  define  each  of  the  individual  properties  that 
equivalence  relations  possess. 

A  relation  RQ  A  X  A  is  reflexive  iff.  Vv  e  A  (( v.  v)  e  R).  For  example,  consider  the 
relation  Address  defined  as  “lives  at  same  address  as".  We  will  make  the  simplifying  as¬ 
sumption  that  everyone  has  only  one  address.  Address  is  a  relation  over  a  set  of  people. 
Clearly  every  person  lives  at  the  same  address  as  him  or  herself,  so  Address  is  reflexive. 


FIGURE  A 2  Representing  relations  as  graphs. 
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1 

1 

1 

FIGURE  A3 

Representing  a 
reflexive  relation. 


So  is  the  js  relation  on  the  integers.  For  every  integer  x,  x  s  x.  But  the  <  relation  is 
not  reflexive:  In  fact,  for  no  integer  .v.  is  x  <  x.  Both  the  directed  graph  and  the  matrix 
representations  make  it  easy  to  tell  if  a  relation  is  reflexive.  In  the  graph  representation, 
every  vertex  will  have,  at  a  minimum,  an  edge  looping  back  to  itself.  In  the  adjacency  ma¬ 
trix  representation,  there  will  be  ones  all  along  the  major  diagonal,  and  possibly  else¬ 
where  as  well.  Figure  A.3  illustrates  both  cases. 

A  relation  RCA  x  A  is  symmetric  iff  V.r,  y  ((.v,  y)e  R-*  (y,  x)  e  R).  The  Address 
relation  we  described  above  is  symmetric.  If  Joanna  lives  with  Ann.  then  Ann  lives  with 
Joanna.  The  ^  relation  is  not  symmetric  (since,  for  example,  2s3,  but  it  is  not  true 
that  3  ^  2).  The  graph  representation  of  a  symmetric  relation  has  the  property  that, 
between  any  two  vertices,  either  there  is  an  arrow  going  in  both  directions  or  there  is 
no  arrow  going  in  either  direction.  So  we  get  graphs  with  components  that  look  like  the 
one  shown  in  Figure  A.4(a).  If  we  choose  the  matrix  representation,  we  will  end  up 
with  a  symmetric  matrix  (i.e..if  you  flip  it  on  its  major  diagonal, you'll  get  the  same  ma¬ 
trix  back  again).  In  other  words,  if  we  have  a  matrix  with  l’s  as  shown  in  the  matrix  of 
Figure  A.4(b),  then  there  must  also  be  l's  in  all  the  squares  marked  with  an  *  in  that 
matrix. 

A  relation  RCA  x  A  is  antisymmetric  iff  Vx,  y  (((jc,  y)  e  R  A  x  #  y)  — *  (y,  x)  «  R). 
The  Mothcr-oJ  relation  we  described  above  is  antisymmetric:  If  Ann  is  the  mother  of 
Catherine,  then  one  thing  we  know  for  sure  is  that  Catherine  is  not  also  the  mother  of 
Ann.  Our  Address  relation  is  clearly  not  antisymmetric. There  are,  however,  relations 
that  are  neither  symmetric  nor  antisymmetric.  One  example  is  the  Likes  relation  on 
the  set  of  people:  If  Joe  likes  Bob.  then  it  is  possible  that  Bob  likes  Joe;  it  is  also  possi¬ 
ble  that  he  doesn't.  Note  that  antisymmetric  is  not  the  same  as  not  symmetric.The  re¬ 
lation  0  is  both  symmetric  and  antisymmetric. 

A  relation  RQ  A  X  A  is  transitive  iff  Vx,  y,  z  (((x.  y)  e  R  A  (y,  z)eR)~*  (x,  z)  e  R). 
A  simple  example  of  a  transitive  relation  is  <.  Address  is  another  one:  If  Bill  lives  with 
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FIGURE  A.4 

Representing  a 
symmetric  relation. 
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Table  A  J  Important  properties  of  relations. 

Properties 

Domain 

Example 

None 

People 

Molher-nj 

Just  reflexive 

People  who  can  see 

Would-rccofimzc-piciurv-of 

Just  symmetric 

People 

1  ias-ever-heeu-niurrieil-to 

Just  transitive 

People 

Amestor-of 

Just  reflexive  and  symmetric 

People 

llaiiKs-out-wiilt  (assuming  wc  can 
say  one  hangs  out  with  oneself) 

Just  reflexive  and  transitive 

Numbers 

s 

Just  symmetric  and  transitive 

Anything 

0 

All  three 

Numbers 

ss 

If 

People 

Address 

Stacy  and  Stacy  lives  with  Lee.  then  Bill  lives  with  Lee.  Moilwr-of  is  not  transitive.  But  if 
we  change  it  slightly  to  Ancestor-of.  then  we  gel  a  transitive  relation.  If  Doreen  is  an  an¬ 
cestor  of  Ann  and  Ann  is  an  ancestor  of  Catherine,  then  Doreen  is  an  ancestor  of 
Catherine. 

The  three  properties  of  reflexivity,  symmetry,  and  transitivity  are  almost  logically  in¬ 
dependent  of  each  other.  We  can  find  simple,  potentially  useful  relations  that  possess 
seven  of  the  eight  possible  combinations  of  these  properties.  We  show  them  in  Table  A.3 
(which  we’ll  extend  to  include  antisymmetry  in  Exercise  A. 10). 

To  see  why  we  can't  find  a  nontrivial  (i.e..  different  from  0)  example  of  a  relation 
that  is  symmetric  and  transitive  but  not  reflexive,  consider  a  simple  relation  R  on 
(1,2.3, 4}.  As  soon  as  R  contains  a  single  element  that  relates  two  unequal  objects 
(e.g..  (1.2)).  it  must,  for  symmetry,  contain  the  matching  element  (2. 1 ).  So  now  we  have 
R'  =  {(1,2),  (2, 1)).  To  make  R'  transitive,  we  must  add  (1.  1)  and  (2, 2).  Call  the  re¬ 
sulting  relation  Rn.  Then  Rn  would  be  reflexive,  except  that  neither  3  nor  4  is  related  to 
itself.  In  fact,  they  are  related  to  nothing.  Wc  cannot  find  an  example  of  a  relation  R 
that  is  symmetric  and  transitive  but  not  reflexive  if  we  insist  that  all  elements  of  the  do¬ 
main  be  related  under  R  to  something. 


A.3.4  Equivalence  Relations 

Although  all  but  one  of  the  combinations  we  just  described  are  reasonable,  one  com¬ 
bination  is  of  such  great  importance  that  we  give  it  a  special  name.  Given  a  domain 
A.  a  relation  RQA  X  A  is  an  equivalent*  relation  iff  it  is  reflexive,  symmetric  and 
transitive.  Equality  (for  numbers,  strings,  or  whatever)  is  an  equivalence  relation  (no 
coincidence,  given  the  name).  So  is  our  Address  (lives  at  same  address)  relation. 

Equality  is  a  very  special  sort  of  equivalence  relation  because  it  relates  an  object 
only  to  itself.  It  doesn't  help  us  much  to  carve  up  a  large  set  into  useful  subsets.  But 
other  equivalence  relations  may  serve  as  very  useful  ways  to  carve  up  a  set. I'o  sec  why 
consider  a  set  A,  with  five  elements,  which  we  can  draw  as  a  set  of  vertices  as  shown  in 
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Figure  A.5(a).  Having  done  that,  we  can  build  an  equivalence  relation  R  on  A.  First, 
we'll  relate  each  vertex  to  itself.  That  will  make  the  relation  reflexive.  Once  that  is 
done,  we'll  have  the  relation  shown  in  Figure  A.5(b). 

Now  let's  add  one  additional  element,  (1,2),  to  R.  As  soon  as  we  do  that,  we  must 
also  add  (2, 1),  since  R  must  be  symmetric.  At  this  point,  we  have  the  relation  shown  in 
Figure  A.5(c).  Suppose  we  now  add  (2, 3).  We  must  also  add  (3, 2)  to  maintain  symme¬ 
try.  In  addition,  because  we  have  (1.2)  and  (2, 3),  we  must  add  (1, 3)  for  transitivity. 
And  then  we  need  (3,  1)  to  restore  symmetry.  That  gives  us  the  relation  shown  in 
Figure  A.5(d). 

Notice  what  happened  here.  As  soon  as  we  related  3  to  2,  we  were  also  forced  to  re¬ 
late  3  to  1 .  If  we  hadn't,  we  would  no  longer  have  had  an  equivalence  relation.  See  what 
happens  now  if  you  add  (3, 4)  to  R. 


4 


Q 

O 


FIGURE  A  .5 

Building  an  equiv¬ 
alence  relation. 
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What  we've  seen  in  this  example  is  that  an  equivalence  relation  R  on  a  set  S  carves 
S  up  into  a  set  of  clusters  or  islands,  which  we'll  call  equivalence  classes.  This  set  of 
equivalence  classes  has  the  following  key  property: 

V.v,  /  e  S  ((.v  e  class)  A  (v.  t)e  R)—*tz  class,). 

In  other  words,  all  elements  of  .S’  that  arc  related  under  R  are  in  the  same  equiva¬ 
lence  class.  To  describe  equivalence  classes,  we  ll  use  the  notation  |.v|  to  mean  the 
equivalence  class  to  which  a*  belongs.  Or  we  may  just  write  (description),  where  de¬ 
scription  is  some  clear  properly  shared  by  all  the  members  of  the  class.  Notice  that,  in 
general,  there  may  be  lots  of  different  ways  to  describe  the  same  equivalence  class.  In 
our  example,  for  instance,  ( I ),  [2],  and  (3)  arc  different  names  for  the  same  equivalence 
class,  which  includes  the  elements  1.  2,  and  3.  In  this  example,  there  are  two  other 
equivalence  classes  as  well:  |4|  and  (5). 

Recall  that  H  is  a  partition  of  a  set  A  iff  (a)  no  element  of  n  is  empty;  (b)  all  mem¬ 
bers  uf  n  are  disjoint;  and  (c)  the  union  of  all  the  elements  of  II  equals  A.  If  R  is  an 
equivalence  relation  on  a  nonempty  set  A,  then  the  set  of  equivalence  classes  of  R  con¬ 
stitutes  a  partition  of  A.  In  other  words,  if  we  want  to  take  a  set  A  and  carve  it  up  into 
a  set  of  subsets,  an  equivalence  relation  is  a  good  way  to  do  it. 


EXAMPLE  A.4  Some  Equivalence  Relations 

All  of  the  following  relations  are  equivalence  relations: 

•  The  Address  relation  carves  up  a  set  of  people  into  subsets  of  people  who  live 
together. 

•  Let  A  be  the  set  of  all  strings  of  letters.  Let  Samelengili  Q  A  X  A  relate 
strings  whose  lengths  are  the  same.  Samelengih  is  an  equivalence  relation  that 
carves  up  the  universe  of  all  strings  into  a  collection  of  subsets,  one  for  each 
natural  number  (i.e.,  strings  of  length  0,  strings  of  length  I ,  etc.). 

•  Let  Z  be  the  set  of  integers.  Let  =  *  C  Z  x  Z  relate  integers  that  are  equivalent 
modulo  3.  In  other  words,  they  have  the  same  remainder  when  divided  by  3.  =3 
is  an  equivalence  relation  with  three  equivalence  classes.  [()].  ( l ),  and  (2).  [0]  in¬ 
cludes  0, 3. 6,  etc.  ( 1 J  includes  1 . 4, 7,  etc.  And  (2)  includes  2. 5.  K,  etc.  We  will  use 
the  notation  =„  for  positive  integer  values  of  n  to  mean  equivalent  modulo  n . 

•  Let  CP  be  the  set  of  C  programs,  each  of  which  accepts  an  input  of  variable 
length.  We  will  call  the  length  of  any  specific  input  n.  Let  Samecomplexiiy  D 
CP  x  CP  relate  two  programs  iff  their  running-lime  complexity  is  the  same. 
More,  precisely,  let  Runningiime{c,  n )  be  the  maximum  time  required  for  pro¬ 
gram  c  to  run  on  an  input  of  length  a.  Then  (cj,  c2)  e  Samecomplexiiy  iff  there 
exist  natural  numbers  /«,.  m2,  k  such  that: 

Vn  >  k  ( Runningtime(c ,,  n)  <  m,  •  Rnnningiime(c2.  n)  A 
Rimningtime(c2,  n)  s  m2 ■  Rinuungiime(c\,  #»)). 

Samecomplexiiy  is  an  equivalence  relation.  We  will  have  a  lot  more  to  say  about 
relations  like  it  in  Part  V. 
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Not  every  relation  that  connects  “similar”  things  is  an  equivalence  relation.  For  ex¬ 
ample.  define  Similarcost(x,  y)  to  hold  if  the  price  of  x  is  within  $1  of  the  price  of  y.  Sup¬ 
pose  A'i  costs  $10,  Xi  costs  $10.50,  and  X$  costs  $1 1.25. Then  Similarcost(X\,  AS)  and 
SiniilurcosliXi ,  AS).  but  not  Similarcost  { X{.  AS).  Similarcost  is  not  transitive, although 
it  is  reflexive  and  symmetric.  So  Similarcost  is  not  an  equivalence  relation. 


.3.5  Orderings 

Important  as  equivalence  relations  are.  they  are  not  the  only  special  kind  of  relation 
worth  mentioning.  Let's  consider  two  more. 

A  partial  order  is  a  relation  that  is  reflexive,  antisymmetric,  and  transitive.  Let  R  be 
a  partial  order  defined  on  a  set  A.  Then  the  pair  (A,  R)  is  a  partially  ordered  set.  If  we 
write  out  any  partial  order  as  a  graph,  we'll  see  a  structure  like  the  ones  in  the  follow¬ 
ing  examples.  Notice  that,  to  make  the  graph  relatively  easy  to  read,  we'll  adopt  the 
convention  that  we  don't  write  in  the  links  that  are  required  by  reflexivity  and  transi¬ 
tivity.  But,  of  course,  they  are  there  in  the  relations  themselves. 


EXAMPLE  A.5  Subset-of  is  a  Partial  Order 

Consider  the  relation  Subset-of,  defined  on  the  set  of  all  sets.  Subset-of  is  a  partial 
order,  since  it  is  reflexive  (every  set  is  a  subset  of  itself),  transitive  (if  A  C  B  and 
BQC .  then  ACC)  and  antisymmetric  (if  A  C  B  and  A  ±  B,  then  it  must  not  be 
true  that  B  C  A).  A  small  piece  of  Subset-of  can  be  drawn  as: 


The  set  of  all  sets 


Z  (the  integers)  p  (the  set  of  people  on  earth) 


Odd  numbers  Even  numbers 


Read  an  arrow  from  .t  lo  V  as  meaning  that  (.t.  v)  is  an  element  of  Subset-of.  So. 
in  this  example.  {3}  is  a  subset  of  (3. 5).  Note  that  in  a  partial  order.it  is  often  the 
case  that  there  arc  some  elements  (such  as  (3,5)  and  (21)  that  are  not  related  lo 
each  other  at  all  (since  neither  is  a  subset  of  the  other).  Remember  in  reading  this 
picture  that  we  have  omitted  the  reflexive  and  symmetric  arrows. 
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EXAMPLE  A.6  Proper-Subset-of  is  Not  a  Partial  Order 

Now  consider  the  relation  Proper-subsei-of.  It  is  not  a  partial  order  because  it  is 
not  reflexive.  For  example  { 1 }  <Z  { 1 }. 


In  many  kinds  of  applications,  it  is  useful  to  organize  the  objects  we  are  dealing  with 
by  defining  a  partial  order  that  corresponds  to  the  notion  of  one  object  being  more  or 
less  general  than  another.  Such  a  relation  may  be  called  a  subsumption  relation. 


EXAMPLE  A.7  Concepts  Form  a  Subsumption  Relation 

Consider  a  set  of  concepts,  each  of  which  corresponds  to  some  significant  set  of 
entities  in  the  world.  Some  concepts  are  more  general  than  others.  We’ll  say  that  a 
concept  x  is  subsumed  by  a  concept  y  (written  x  sy)  iff  every  instance  of  x  is  also 
an  instance  of  y.  Alternatively,  y  is  at  least  as  general  as  x.  A  small  piece  of  this 
subsumption  relation  for  some  concepts  that  might  be  used  to  model  the  mean¬ 
ings  of  common  English  words  is: 


Tiling 


Mammal  Pel  Vehicle 


Concept  subsumption  is  a  partial  order.  It  is  very  similar  to  the  Subset-of  rela¬ 
tion  except  that  it  is  defined  only  on  the  specific  subsets  that  have  been  defined  as 
concepts. 


Subsumption  relations  are  useful  because  they  tell  us  when  we  have  new  informa¬ 
tion.  If  we  already  know  that  some  object  X\  is  a  cat,  we  learn  nothing  new  when  told 
that  it  is  an  animal. 
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EXAMPLE  A.8  Logical  Statements  Form  a  Subsumption  Lattice 

A  first-order  logic  sentence  P  is  subsumed  by  another  sentence  Q  (written  P  ^  Q) 
iff.  whenever  Q  is  true  P  must  be  true,  regardless  of  the  values  assigned  to  the  vari¬ 
ables,  functions,  and  predicates  of  P  and  Q.  For  example:  Vx  (P(x))  subsumes 
P(A',),  since,  regardless  of  what  the  predicate  P  is  and  what  axioms  we  have 
about  it,  and  regardless  of  what  object  A’j  represents,  if  Vx  (P(x))  is  true,  then 
P(X |)  must  be  true.  Why  is  this  a  useful  notion?  Suppose  that  we  are  building  a 
theorem-proving  or  reasoning  program.  If  we  already  know  Vx  P(x),  and  we  are 
then  told  P (A1,),  we  can  throw  away  this  new  fact.  It  doesn’t  add  to  our  knowl¬ 
edge  (except  perhaps  to  focus  our  attention  on  the  object  A)),  since  it  is  sub¬ 
sumed  by  something  we  already  knew.  A  small  piece  of  the  subsumption  relation 
on  sentences  is  shown  in  the  following  graph: 


The  subsumption  relation  on  sentences  is  a  partial  order. 


The  symbol  s  is  often  used  to  denote  a  partial  order.  Let  s  bean  arbitrary  partial 
order  defined  on  some  domain  A.  Any  element  x  of  A  such  that  Vy  e  A  ((y  <  x)  -* 
(y  —  x))  is  a  minimal  element  of  A  with  respect  to  In  other  words,  x  is  a  minimal  el¬ 
ement  if  there  are  no  other  elements  less  than  or  equal  to  it.  Similarly,  any  element  x  of 
A  such  that  Vy  e  A  (x  ^  y  *  y  =  x)  is  a  maximal  element  of  A  with  respect  to 
There  may  be  more  than  one  minimal  (or  maximal)  element  in  a  partially  ordered  set. 
For  example,  the  partially  ordered  set  of  concepts  in  Example  A.7  has  two  minimal  el¬ 
ements,  Pet  Cat  and  Vehicle.  If  there  is  a  unique  minimal  element  it  is  called  the  least 
element.  If  there  is  a  unique  maximal  element  it  is  called  the  greatest  element.  The  set 
of  logical  sentences  ordered  by  subsumption  has  a  greatest  element,  False ,  which  sub¬ 
sumes  everything.  It  makes  the  strongest,  and  in  fact,  unsatisfiable  claim.  There  is  also 
a  least  element.  True ,  which  makes  the  weakest  possible  claim,  and  is  subsumed  by  all 
other  sentences. 

A  total  order  R  C  A  x  A  is  a  partial  order  that  has  the  additional  property  that 
Vx,  ye.  A  ((x, y) e  R  V  (y,  x)  e  R).  In  other  words,  every  pair  of  elements  must  be  re¬ 
lated  to  each  other  one  way  or  the  other.The  classic  example  of  a  total  order  is  ^  (or 
2,  if  you  pre  er)  on  the  integers. The  s  relation  is  a  partial  order  and,  given  any  two 
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4  FIGURE  \A 
t  Drawing  a  total 
3  order  as  a  graph. 


integers  .v  and  y,  cither  x  ■£  y  ot  y  s  x.  If  we  draw  any  total  order  as  a  graph,  we’ll  get 
something  that  looks  like  the  graph  in  Figure  A.6  (again  without  the  reflexive  and 
transitive  links  shown). 

This  is  only  a  liny  piece  of  the  graph,  of  course.  It  continues  infinitely  in  both  direc¬ 
tions.  But  notice  that,  unlike  our  earlier  examples  of  partial  orders,  there  is  no  splitting 
in  this  graph.  For  every  pair  of  elements,  one  is  above  and  one  is  below.  If  R  is  a  total 
order  defined  on  a  set  A,  then  the  pair  (A.  R)  is  a  totally  ordered  set.  Of  course,  not  all 
partial  orders  are  also  total.  For  example,  the  Suhset-of  relation  we  described  in 
Example  32.5  is  not  a  total  order. 

Given  a  partially  ordered  set  ( A ,  R),  an  infinite  descending  chain  is  a  totally  or¬ 
dered,  with  respect  to  R,  subset  B  of  A  that  has  no  minimal  element.  If  ( A .  R)  contains 
no  infinite  descending  chains  then  it  is  called  a  well-founded  set.  An  equivalent  defini¬ 
tion  is  the  following:  A  partially  ordered  set  (A.  R)  is  a  well-founded  set  iff  every  sub¬ 
set  of  A  has  at  least  one  minimal  clement  with  respect  to  R.  If  (A.  R)  is  a  well-founded 
set  and  R  is  a  total  order,  then  (A.  R)  is  called  a  well-ordered  set.  Every  well-ordered 
set  has  a  least  element.  For  example,  consider  the  sets  M  (the  natural  numbers)  and  Z 
(the  integers).  The  totally  ordered  set  (N.  s)  is  well-founded  and  well-ordered.  Its 
least  clement  is  O.The  totally  ordered  set  (Z.  ^)  is  neither  well-founded  nor  well-or¬ 
dered,  since  it  contains  an  infinite  number  of  infinite  descending  chains,  such  as 
3.2.  1,0. -1.-2 . 

Table  A. 4  reviews  some  of  our  examples. 

Well-founded  and  well-ordered  sets  are  important.  Well-ordered  sets  provide  the 
basis  for  proofs  by  induction  (as  we’ll  see  in  Section  A.b.5).  Well-founded  sets  (that  are 
often  also  well-ordered)  provide  the  basis  for  proofs  that  loops  and  recursively  defined 
functions  halt  (as  we’ll  see  in  Section  A. 7.1 ). 


Table  A.4  Checking  lor  well-foundedness  and  well-orderedness.  j 

(A,  R) 

Well-founded ? 

The  set  of  sets  with  respect  to  the  subset-of  relation 

Yes 

No 

The  set  of  concepts  with  respect  to  \uh\umption 

Yes 

No 

The  set  of  first -order  sentences  with  respect  to 
subsumption 

Yes 

No 

The  set  of  natural  numbers  under  ^ 

Yes 

Yes 

The  set  of  integers  under  ^ 

No 

No 
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A.4  Functions 

Relations  are  very  general.  They  allow  an  object  to  be  related  to  any  number  of  other 
objects  at  the  same  lime  (as  they  are,  for  example,  in  our  Dessert  relation).  Sometimes 
we  want  a  more  restricted  notion,  in  which  each  object  is  related  to  a  unique  other  ob¬ 
ject.  For  example,  (at  least  in  an  ideal  world  without  criminals  or  incompetent  bureau¬ 
crats)  each  United  States  resident  is  related  to  a  unique  social  security  number.  To 
capture  this  idea  we  need  functions. 


y\.4.1  What  is  a  Function? 

We  begin  with  the  common  definition  of  a  function:  A  function /from  a  set  A  to  a  set 
B  is  a  binary  relation  that  is  a  subset  of  A  X  B,  with  the  additional  property  that: 

V.te  A  ((((.v.  y)sf  A  (x.z)ef)-*y  =  z)  A  3y  e  B  ((.r,  y)  e  /)). 

In  other  words,  each  element  of  A  is  related  to  exactly  one  element  of  B. 

The  Dessert  relation  we  defined  earlier  is  not  a  function  since  Dave  and  Sara  each 
occur  twice.  We  haven’t  restricted  each  person  to  precisely  one  dessert.  A  simple  rela¬ 
tion  that  is  a  function  is  the  successor  function  succ  defined  on  the  integers: 

succ(n)  *11  +  1, 

Of  course,  we  cannot  write  out  all  the  elements  of  succ  (since  there  are  an  infinite 
number  of  them),  but  succ  includes: 

{ . (-3.  -2),  (-2,  -1 ),  (- 1, 0).  (0, 1),  (1 , 2),  (2, 3)  •  •  •  }. 

It  is  useful  to  define  some  additional  terms  to  make  it  easy  to  talk  about  functions. 
We  start  by  writing: 

f:A—*B, 

which  means  that /is  a  function  from  the  set  A  to  the  set  B.  We  call  A  the  domain  of  / 
and  B  the  codomain  or  range  of  /.We  may  also  say  that  /  is  a  function  from  A  to  B. 
Using  this  notation,  we  can  write  function  definitions  that  have  two  parts,  the  first  of 
which  specifies  the  domain  and  range  and  the  second  of  which  defines  the  way  in  which 
the  elements  of  the  range  are  related  to  the  elements  of  the  domain.  So,  for  example, 
we  define  the  successor  function  on  the  integers  (denoted  as  Z)  by  writing: 

succ :  Z  — *  Z, 
succ(n)  =  n  +  1. 

If  x  e  A.  then  we  write: 


fix). 

which  we  read  as,“/ of  .v’\  to  indicate  the  element  of  B  to  which  x  is  related.  We  call  this 
element  the  image  of  x  under/ or  the  value  of / for  x.  Note  that,  given  the  definition  of 
a  function,  there  must  be  exactly  one  such  element.  We  will  also  call  .v  the  argument  of 
/  For  example  we  have  that: 


succ(  1 )  =  2,  succ{2)  =  3,... 
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Thus  2  is  the  image*  (or  ihe  value)  of  Ihe  argument  l  under  sure. 

We  will  also  use  the  notation /(.v)  to  refer  to  the  lunetion  f( as  opposed  to  f's  value 
at  a  specific  point  .v)  whenever  we  need  a  way  to  refer  to  /'s  argument.  So.  for  example, 
we'll  write,  as  we  did  above,  succ(n)  —  n  +  1. 

The  function  sure  is  a  unary  function.  It  maps  from  a  single  element  (a  number)  to 
another  element.  We  arc  also  interested  in  functions  that  map  from  ordered  pairs  of  el¬ 
ements  to  a  value.  We  call  such  functions  binary  functions.  For  example,  integer  addi¬ 
tion  is  a  binary  function: 

+:(Z  x  Z)  —  Z. 

Thus  +  includes  elements  such  as  ((2, 3).  5).  since  2  +  3  is  5.  We  could  also  write: 

+((2,3))  =  5. 

We  have  used  double  parentheses  here  because  we  are  using  the  outer  set  to  indi¬ 
cate  function  application  (as  we  did  above  without  confusion  for  sure)  and  the  inner 
set  to  define  the  ordered  pair  to  which  the  function  is  being  applied.  But  this  is  confus¬ 
ing.  So,  generally,  when  the  domain  of  a  function  is  the  Cartesian  product  of  two  or 
more  sets,  as  it  is  here,  we  drop  the  inner  set  of  parentheses  and  simply  write: 

+(2,3)  =5. 

The  prefix  notation  that  we  have  used  so  far.  in  which  we  write  the  name  of  the 
function  first,  followed  by  its  arguments,  can  be  used  for  functions  of  any  number  of  ar¬ 
guments.  For  the  specific,  common  case  of  binary  functions,  it  is  often  convenient  to  use 
an  alternative:  infix  notation,  in  which  the  function  name  (often  called  the  operator)  is 
written,  between  its  two  arguments: 

2  +  3  =  5. 

So  far,  we  have  considered  unary  functions  and  binary  functions.  But  just  as  we 
could  define  n-ary  relations  for  arbitrary'  values  of  //.we  can  define  n-ary  functions.  For 
any  positive  integer  n,  an  n-ary  function  /is  a  function  that  is  defined  as: 

/:(5,  X  S2  ....  X  S„)  — *  K. 

For  example,  let  Z  be  the  set  of  integers. Then, 

quaiiraticeqitation  :  (Z  x  Z  x  Z )  — *  F 

is  a  function  whose  domain  is  an  ordered  triple  of  integers  and  whose  range  is  a  set  of 
functions. The  definition  of  quadraiicequuiion  is: 

qtW(iraticequalion(u.b,c )  =  ax'  +  h.x  +  c„ 

What  we  did  here  is  typical  of  function  definition.  First  we  specify  the  domain  and 
the  range  of  the  function. Then  we  define  how  the  function  is  to  compute  its  value  (an 
element  of  the  range)  given  its  arguments  (an  element  of  the  domain). 

Whenever  the  domain  of  a  function /is  an  ordered  /r-tuple  of  elements  drawn  from 
a  single  set  5.  we  may  (loosely)  say  that  the  domain  of  /is  .S'.  In  this  ease,  we  may  also 
sav  that  /is  a  function  of  n  arguments.  So,  for  example,  we  may  talk  about  the  binary 
function  +  on  the  domain  N  (when,  pioperly.its  domain  is  N  x  f^). 
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Recall  that,  in  the  last  section,  we  said  that  we  could  compose  binary  relations  to  de¬ 
rive  new  relations.  Clearly,  since  functions  are  just  special  kinds  of  binary  relations,  if 
we  can  compose  binary  relations  we  can  certainly  compose  binary  functions.  Because  a 
function  returns  a  unique  value  for  each  argument,  it  generally  makes  a  lot  more  sense 
to  compose  functions  than  it  does  relations,  and  you’ll  see  that  although  we  rarely  com¬ 
pose  relations  that  aren't  functions,  we  compose  functions  all  the  time.  So.  following 
our  definition  above  for  relations,  we  define  the  composition  of  two  functions 
/  c  A  X  B  and  g  C  B  x  C,  written  g  0  /,  as: 

f /  =  {(a.  c) :  3b  {(a.  b)  e  f  and  (b.c) eg)). 

Notice  that  the  composition  of  two  functions  must  necessarily  also  be  a  function.  We 
mentioned  above  that  there  is  sometimes  confusion  about  the  order  in  which  relations 
(and  now  functions)  should  be  applied  when  they  are  composed.  To  avoid  this  prob¬ 
lem,  we  will  introduce  a  new  notation  g(/(.v)).We  use  the  parentheses  here  to  indicate 
function  application,  just  as  we  did  above.  So  g  0  /(.v)  =  g(/( a)).  This  notation  is  clear. 
Apply  g  to  the  result  of  first  applying/to  .v.This  notation  reads  right  to  left  as  does  our 
definition  of  the  0  notation. 

.4.2  Properties  of  Functions 

Some  functions  possess  properties  that  may  make  them  particularly  useful  for  certain 
tasks.  The  definition  that  we  gave  for  a  function  at  the  beginning  of  this  section  re¬ 
quired  that,  for  / :  A  — *  B  to  be  a  function,  it  must  be  defined  for  every  element  of  A 
(i.e.,  every  element  of  A  must  be  related  to  some  element  of  J3).This  is  the  standard 
mathematical  definition  of  a  function.  But.  as  we  pursue  the  idea  of  “computable  func¬ 
tions”  (i.e.,  functions  that  can  be  implemented  on  some  reasonable  computing  plat¬ 
form),  we’ll  see  that  there  arc  functions  whose  domains  cannot  be  effectively  defined. 

For  example,  consider  a  function  steps  whose  input  is  a  Java  program  and  whose  re¬ 
sult  is  the  number  of  steps  that  are  executed  by  the  program  on  the  input  O.This  func¬ 
tion  is  undefined  for  programs  that  do  not  halt  on  the  input  0.  As  we’ll  see  in  Chapter 
19,  there  can  exist  no  program  that  can  check  a  Java  program  and  determine  whether 
or  not  it  will  hall  on  the  input  0.  So  there  is  no  program  that  can  look  at  a  possible  input 
to  steps  and  determine  whether  that  input  is  in  steps' s  domain.  In  Chapter  25,  we  will 
consider  two  approaches  to  fixing  this  problem.  One  is  to  extend  the  range  of  steps,  for 
example  by  adding  a  special  value.  Error .  that  can  be  the  result  of  applying  steps  to  a 
program  that  doesn’t  hall  on  input  O.The  difficulty  with  this  approach  is  that  steps  be¬ 
comes  uncomputable  since  there  exists  no  algorithm  that  can  know  when  to  return 
Error.  Our  alternative  is  to  expand  the  domain  of  steps ,  for  example  to  the  set  of  all 
Java  programs.  Then  we  must  acknowledge  that  if  steps  is  applied  to  certain  elements 
of  its  domain  (i.e.,  programs  that  don’t  hall),  its  value  will  be  undefined. 

In  order  to  be  able  to  talk  about  functions  like  steps,  we  ll  introduce  two  new  terms.  We'll 
say  that  /:  A  -»  B  is  a  total  function  on  A  iff  it  is  a  function  that  is  defined  on  all  elements 
of  A  (i.e..  it  is  a  function  in  the  standard  mathematical  sense).  We'll  say  that  /:  A  -*  B  is  a 
partial  function  on  A  ifl  /  is  a  subset  of  A  x  6  and  ever)*  element  of  A  is  related  to  no 
more  than  one  element  ol  B,  In  Chapter  25  wc  will  return  to  a  discussion  of  partial  func¬ 
tions.  Until  then,  when  we  say  that  /  is  a  function,  we  will  mean  that  it  is  a  total  function. 
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A  function  f:  A—*  B  is  one-to-one  iff  no  two  elements  of  A  map  to  the  same  ele¬ 
ment  of  B.  Slice  is  one-to-one.  For  example,  the  only  number  to  which  we  can  apply 
succ  and  derive  2  is  1.  Quadratic-equation  is  also  one-to-one.  But  +  isn't.  For  example, 
both  +(2. 3)  and  +(4, 1 )  equal  5. 

A  function  /:  A—*  B  is  onto  iff  every  element  of  B  is  the  value  of  some  element  of 
A.  Another  way  to  think  of  this  is  that  a  function  is  onto  iff  all  of  the  elements  of  B  are 
“covered"  by  the  function.  As  we  defined  it  above. succ  is  onto.  But  let's  define  a  differ¬ 
ent  function  succ'  on  (the  natural  numbers),  rather  than  the  integers.  So  we  define: 

succ' :  N  — *  N. 
succ' (n)  =  n  +  1. 

succ'  is  not  onto  because  there  is  no  natural  number  i  such  that  succ'(i)  =  0. 

The  easiest  way  to  envision  the  differences  between  an  arbitrary  relation,  a  func¬ 
tion,  a  one-to-one  function,  and  an  onto  function  is  to  make  two  columns  (the  first  for 
the  domain  and  the  second  for  the  range)  and  think  about  the  kind  of  matching  prob¬ 
lems  you  probably  had  on  tests  in  elementary  school. 

Consider  the  six  matching  problems  shown  in  Figure  A.7.  In  each,  we'll  consider 
ways  of  relating  the  elements  of  the  first  column  (the  domain)  to  the  elements  of  the 
second  column  (the  range).  Example  1  describes  a  relation  that  is  not  a  (total)  func¬ 
tion  because  C  is  an  clement  of  its  domain  that  is  not  related  to  any  element  of  its 
range.  Example  2  describes  a  relation  that  is  not  a  function  because  there  are  three 
values  associated  with  A. The  third  example  is  a  function  since,  for  each  object  in  the 
first  column,  there  is  a  single  value  in  the  second  column.  But  this  function  is  neither 
one-to-one  (because  X  is  derived  from  both  A  and  B)  nor  onto  (because  Z  is  not  the 
image  of  anything). The  fourth  example  is  a  function  that  is  one-to-one  (because  no 
clement  of  the  second  column  is  related  to  more  than  one  element  of  the  first  col¬ 
umn).  But  it  still  isn't  onto  because  Z  has  been  skipped:  Nothing  in  the  first  column 
derives  it.  The  fifth  example  is  a  function  that  is  onto  (since  every  element  of  col¬ 
umn  two  has  an  arrow  coming  into  it),  but  it  isn’t  one-to-one. since  Z  is  derived  from 
both  C  and  D. The  sixth  and  final  example  is  a  function  that  is  both  one-to-one  and 
onto.  By  the  way.  see  if  you  can  modify  either  example  4  or  example  5  to  make  it 
both  one-to-one  and  onto.  You're  not  allowed  to  change  the  number  of  elements  in 
either  column,  just  the  arrows.  You'll  notice  that  you  can’t  do  it.  In  order  for  a  func¬ 
tion  to  be  both  one-to-one  and  onto,  there  must  be  equal  numbers  of  elements  in 
the  domain  and  the  range. 


FIGURE  A.7  Kinds  of  relations  and  functions. 
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The  inverse  or  a  binary  function /is  the  relation  that  contains  exactly  the  ordered 
pairs  in  f  with  the  elements  of  each  pair  reversed.  We’ll  write  the  inverse  of  /  as  f~l. 
Formally,  if  /  C  A  x  B.  then  JTl  C  B  x  A  =  {(/>,  n):  ( a .  6)  e/}.  Since  every  function 
is  a  relation,  every  function  has  a  relational  inverse,  but  that  relational  inverse  may  or 
may  not  also  be  a  function.  For  example,  look  again  at  example  3  of  the  matching  prob¬ 
lem  above.  Although  it  is  a  function,  its  inverse  is  not.  Given  the  argument  X ,  should 
we  return  the  value  A  or  fi?  Now  consider  example  4.  Its  inverse  is  also  not  a  function, 
since  there  is  no  value  to  be  returned  for  the  argument  Z.  Example  5  has  the  same 
problem  example  3  does.  Now  look  at  example  6.  Its  inverse  is  a  function.  Whenever  a 
function  is  both  one-to-one  and  onto,  its  inverse  will  also  be  a  function  and  that  func¬ 
tion  will  be  both  one-to-one  and  onto.  Such  functions  are  called  bijections.  Bijections 
are  useful  because  they  enable  us  to  move  back  and  forth  between  two  sets  without 
loss  of  information.  Look  again  at  example  6.  We  can  think  of  ourselves  as  operating  in 
the  {/t,  fi.  C)  universe  or  in  the  {X,Y,Z\  universe  interchangeably  since  we  have  a 
well  defined  way  to  move  from  one  to  the  other.  And  if  we  move  from  column  one  to 
column  two  and  then  back,  we’ll  be  exactly  where  we  started. 

It  is  sometimes  useful  to  talk  about  functions  that  map  one  object  to  another  but,  in 
so  doing,  do  not  fundamentally  change  the  way  that  the  objects  behave  with  respect  to 
some  structure  (i.e..some  set  of  functions  that  we  care  about).  A  homomorphism  is  a 
function  that  maps  the  elements  of  its  domain  to  the  elements  of  its  range  in  such  a  way 
that  some  structure  of  the  original  set  is  preserved.  So.  considering  just  binary  func¬ 
tions,  if /is  a  homomorphism  and  #  is  a  binary  function  in  the  structure  that  we  are  con¬ 
sidering.  then  it  must  be  case  that  V.r.  y  (/(*)#  /( v)  =  /(x#y)).  The  structure  of 
unary  and  higher  order  functions  must  also  be  preserved  in  a  similar  way. 

Given  a  particular  function /.whether  or  not  it  is  a  homomorphism  depends  on  the 
structure  that  we  are  considering.  So,  for  example,  consider  the  integers,  along  with 
one  function,  addition.  Then  the  function  f(x)  =  2x  is  a  homomorphism  because 
2x  +  2y  =  2(.v  +  y).  But,  if  the  structure  we  are  working  with  also  contains  a  second 
function,  multiplication,  then  /is  no  longer  a  homomorphism  because,  unless  x  or  y  is 
0,2.r*2y  *  2(.vy). 

If  a  homomorphism  /  is  also  a  bijection,  then  it  is  called  an  isomorphism.  If  two 
objects  are  isomorphic  to  each  other,  then  they  are  indistinguishable  with  respect  to 
the  defining  structure.  For  example,  consider  the  set  of  undirected  graphs,  along 
with  all  ol  the  standard  graph  operations  that  determine  size  and  paths.  If  G  is  an  ar¬ 
bitrary  graph,  let  J\G)  be  exactly  (/  except  that  the  symbol  #  is  appended  to  the 
name  of  every  vertex,  litis  function  J  is  an  isomorphism.  The  two  graphs  shown  in 
Figure  A.X  are  isomorphic. 

When  the  intersection  of  the  domain  and  the  range  of  a  function /is  not  empty.it  is 
sometimes  uselul  to  find  elements  of  the  domain  that  are  unchanged  by  the  application 


1 


1# - 2# 


3#  4# 


IH.tJKh  A.8  Two  isomorphic  graphs. 
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of/.  A  fixed  point  of  a  function  /is  an  element  x  of  /'s  domain  with  the  property  that 
f(x)  =  x.  For  example.  1  and  2  are  fixed  points  of  the  factorial  function  since  1!  =  1 
and  2!  =  2.  The  factorial  function  has  no  other  fixed  points. 


A.4.3  Properties  of  Binary  Functions 

Any  relation  that  uniquely  maps  from  each  element  of  its  domain  to  some  element  of 
its  range  is  a  function.  The  two  sets  involved  can  he  anything  and  the  mapping  can  be 
arbitrary.  However,  most  of  the  functions  we  actually  care  about  behave  in  some  sort  of 
regular  fashion.  It  is  useful  to  articulate  a  set  of  properties  that  many  of  the  functions 
that  we'll  study  have.  When  these  properties  are  true  of  a  function,  or  a  set  of  functions, 
they  give  us  techniques  for  proving  additional  properties  of  the  objects  involved.  In  the 
following  definitions,  we  will  consider  an  arbitrary  binary  function  #  defined  over  a  set 
A.  As  examples,  we'll  consider  functions  whose  actual  domains  are  ordered  pairs  of 
sets,  integers,  strings,  and  Boolean  expressions. 

•  A  binary  function  #  is  commutative  iff  V.v,  ye  A  (x  #  y  =  y  #  .v).  Examples: 


•+]'  =  ]+  »• 
AD  13  =  BH  A. 
P  A  Q  s  Q  a  P. 


(integer  addition) 
(set  intersection) 
(Boolean  and) 


A  binary  function  #  is  associative  iff  V.v.  y.  z 
(i  +  /)  +  *»/  +  (j  +  k). 

(/triune  =  /tn(flnc). 

(P  A  Q)  A  R  =  P  A  (0  A  R). 

(.v  ||  t)  ||  ip  =  s  ||  (/ 1|  it*). 


e  A  ((.r  #  y)  #  z  =  x  #  (y#r)).  Examples: 
(integer  addition) 

(set  intersection) 


(Boolean  and) 

(string  concatenation) 
A  binary  function  #  is  idempotent  i IT  V.v  e  A  (x  #  x  =  .v).  Examples: 


min(U )  =  i.  (integer  minimum) 

A  IT  A  =  A.  (set  intersection) 

P  A  P  15  P.  (Boolean  and) 

•  The  distributivity  property  relates  two  binary  functions:  A  function  #  distributes  over 
another  function  %  iff  V.v.  y,  z  e  A  (.v  #  (y  %  z)  =  (.v  #  y)  %  (.v  #  z)).  Examples: 

i •  (j  +  k)  =  (/*/)  +  (i-k).  (integer  multiplication  over  addition) 

A  U  (BHC)  =  (A  U  B)  fl  (AU  C).  (set  union  over  intersection) 

P  A  (Q  v  K)  *  (P  A  Q)  V  (P  A  Q).  (Boolean  and  over  or) 

•  Absorption  laws  also  relate  two  binary  functions  to  each  other:  A  function  #  ab¬ 
sorbs  another  function  %  iff  V.v.  y  e  A  (.v  #  (.v  %  y)  =  .i )).  Examples: 

A  fl  (A  U  B)  =  A.  (Set  intersection  absorbs  union.) 

P  V  (P  A  Q)  =  P.  (Boolean  or  absorbs  and.) 

P  A  (P  V  Q)  =  P.  (Boolean  and  absorbs  or.) 
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It  is  often  the  case  that  when  a  function  is  defined  over  some  set  A,  there  are  special 
elements  of  A  that  have  particular  properties  with  respect  to  that  function.  In  particu¬ 
lar.  it  is  worth  defining  what  it  means  to  be  an  identity  and  to  be  a  zero: 


An  element  a  is  an  identity  for  the  function  #  iff  Vx  e  A  ((x  #  a  =  x)  A  {a  #  x  =  x)). 
Examples: 


i  •  l  =  i. 

/  +  0  =  t. 

A  U0  =  A. 

P  V  False  *  P. 

S  ||  ""  =  5. 


(1  is  an  identity  for  integer  multiplication.) 
(0  is  an  identity  for  integer  addition.) 

(0  is  an  identity  for  set  union.) 

{False  is  an  identity  for  Boolean  or.) 

(""  is  an  identity  for  string  concatenation) 


Sometimes  it  is  useful  to  differentiate  between  a  right  identity  (one  that  satisfies  the 
first  requirement  above)  and  a  left  identity  (one  that  satisfies  the  second  requirement 
above).  But  for  all  the  functions  we'll  be  concerned  with,  if  there  is  a  left  identity,  it  is 
also  a  right  identity  and  vice  versa,  so  we  will  talk  simply  about  an  identity. 


•  An  clement  a  is  a  zero  for  the  function  #  iff  Vx  e  A  ((x  #  a  =  a)  A  (a  #  x  =  o)). 
Examples: 

r  0  =  0.  (0  is  a  zero  for  integer  multiplication.) 

A  D  0  =  0.  (0  is  a  zero  for  set  intersection.) 

P  A  False  =  False.  {False  is  a  zero  for  Boolean  and.) 

Just  as  with  identities,  it  is  sometimes  useful  to  distinguish  between  left  and  right  zeros, 
but  we  won't  need  to. 

Although  we’re  focusing  here  on  binary  functions,  there’s  one  important  property 
that  unary  functions  may  have  that  is  worth  mentioning  here: 

•  A  unary  function  $  is  a  self  inverse  iff  Vx  ($($(x))  =  jt).  In  other  words,  if  we  com¬ 
pose  the  function  with  itself  (apply  it  twice),  we  get  back  the  original  argument. 
Examples: 


""(“O' ))  ~  i-  (Multiplying  by  —  is  a  self  inverse  for  integers.) 

1/(1//)  =  /  if  i  #  0.  (Dividing  into  1  is  a  self  inverse  for  integers.) 

-viA  =  A.  (Complement  is  a  self  inverse  for  sets.) 

~  P-  (Negation  is  a  self  inverse  for  Booleans.) 

(SR)R  =  s.  (Reversal  is  a  self  inverse  for  strings.) 


4  properties  of  Functions  on  Sets 

The  functions  that  we  have  defined  on  sets  satisfy  most  of  the  properties  that  we  have 
just  considered.  Further,  as  we  saw  above,  some  set  functions  have  a  zero  or  an  identity. 
We  11  summarize  here  (without  proof)  the  most  useful  properties  that  hold  for  the 
functions  we  have  defined  on  sets: 

•  Commutativity:  A  U  B  =  R  u  A. 

AC\B  =  RC\  a 
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•  Associativity: 

•  Idcmpotency: 

•  Distributivity: 

•  Absorption: 

•  Identity: 

•  Zero: 

•  Self  Inverse: 


(A\JH)UC  =  /lU(/jUC). 

(/inline  =  /lmanc). 

AUA  =  A. 

A  n  A  =  A. 

AU(nnc)  =  (/tutfini/iuc). 

/tn(tfuc)  =  (/tnB)U|/tnc), 

(AUfllHA  =  A. 

(AClR)U  A  =  A. 

AU0  =  A. 

a  n  0  =  0. 

-»-iA  =  /l. 


In  addition,  we  will  want  to  make  use  of  the  following  theorems  that  can  be  proven 
to  apply  specifically  to  sets  and  their  operations  (as  wrell  as  to  Boolean  expressions, 
with  V  substituted  for  U  and  A  substituted  fl ): 


•  De  Morgans  Laws:  -«{/l  U  B)  =  -,A  O  ->#. 

— i(  A  n  R )  =  -i A  U  -i  B. 


A.5  Closures 

We  say  that  a  binary  relation  R  on  a  set  A  is  closed  under  property  P  iff  R  possesses  P.  For 
example,  the  relation  ^  as  generally  defined  on  the  integers  is  closed  under  transitivity. 

Sometimes,  if  a  relation  R  is  not  closed  under  P  we  may  want  to  ask  what  elements 
would  have  to  be  added  to  R  so  that  it  would  possess  P.  So.  let  R  be  a  binary  relation  on 
a  set  A.  A  relation  R'  is  a  closure  of  R  with  respect  to  some  properly  P  iff: 

•  RQR\ 

•  R'  is  closed  under  P,  and 

•  there  is  no  smaller  relation  R "  that  contains  R  and  is  closed  under  P.  (One  relation 
R |  is  smaller  than  another  relation  ft?  iff  | R\ I  <  l/fil. 

In  other  words,  to  form  the  closure  of  R  with  respect  to  P  we  add  to  R  the  minimum 
number  of  elements  required  to  establish  P.  So.  for  example,  the  transitive  closure  of  a 
binary  relation  R  is  the  smallest  relation  R'  that  contains  R  but  is  transitive. Thus,  if  R 
contains  the  elements  (.r.y)  and  (y.  z),  the  transitive  closure  of  R  must  also  contain  the 
element  (x.z). 

EXAMPLE  A.9  Forming  Transitive  and  Reflexive  Closures 
Let  R  =  {(1.2).  (2, 3),  (3, 4)}. 

•  The  transitive  closure  of  R  is  { (1,2).  (2. 3).  (3. 4).  (1. 3).  (1.4).  (2,4) } . 

•  The  reflexive  closure  of  R  is  {(1,2),(2.3).(3.4),(1,1),(2.2).(3,3).(4,4)}. 
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EXAMPLE  A.10  The  Transitive  Closure  of  Parent-of 

The  transitive  closure  of  Parent-of  is  Ancestor-of. 


Under  some  conditions  (which  will  hold  in  all  the  cases  we  consider),  it  is  possible  to 
prove  that  a  relation  R  has  a  unique  closure  under  the  property  P.  (See  Section  A.8  for 
a  discussion  of  this  issue.) 

We  can  define  the  closure  of  a  set  with  respect  to  a  function  in  a  similar  manner.  Let 
/be  a  function  of  n  arguments.  We  say  that  a  set  A  is  closed  under  f  iff,  whenever  all  n 
of  /'  s  arguments  are  elements  of  A,  the  value  of/is  also  in  A.  For  example,  the  positive 
integers  are  closed  under  addition.  The  positive  integers  are  not  closed  under  subtrac¬ 
tion  since,  for  example  7  -  10  =  -3. 

As  we  did  for  relations,  we  may  again  want  to  consider,  whenever  a  set  A  is  not 
closed  under  some  function  f:  X  —*Y,  how  A  could  be  augmented  (with  additional 
elements  drawn  from  A')  so  that  it  would  be  closed.  Let  /be  function  of  n  arguments 
drawn  front  a  set  A.  A  set  A'  is  a  closure  of  A  under  /iff: 

•  ACA'. 

•  A '  is  closed  under  /,  and 

•  there  is  no  smaller  set  A"  that  contains  A  and  is  closed  under  /. 


EXAMPLE  A.11  Closures  under  Functions 


•  {()}  is  not  closed  under  the  successor  function  succ,  since  succ{ 0)  =  1.  The 
closure  of  {0}  under  succ  is  f^l  (the  natural  numbers). 

•  is  closed  under  addition  (since  the  sum  of  any  two  natural  numbers  is  a  nat¬ 
ural  number).  So  the  closure  of  N  under  addition  is  simply 

•  N  is  not  closed  under  subtraction.  For  example,  5  —  7  is  not  a  natural  number. 
The  closure  of  under  subtraction  is  Z  (the  integers). 

•  Z  is  not  closed  under  division.  Its  closure  under  division  is  Q  (the  rational 
numbers)  plus  a  special  element  that  is  the  result  of  dividing  by  0. 

•  Q  is  not  closed  under  limits.  Its  closure  under  limits  is  R  (the  real  numbers). 

•  R  is  not  closed  under  square  root.  Its  closure  under  square  root  is  C  (the  com¬ 
plex  numbers). 

•  The  set  ol  even  length  strings  of  a  s  and  b’s  is  closed  under  concatenation. 

•  The  set  of  odd  length  strings  of  a  s  and  b’s  is  not  closed  under  concatenation. 
For  example  the  siring  aaa  concatenated  with  the  string  aaa  is  aaaaaa,  whose 
length  is  not  odd.  The  closure  of  the  odd  length  strings  of  a’s  and  b’s  under 
concatenation  is  the  set  of  all  strings  of  a’s  and  b’s. 
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EXAMPLE  A.11  ( Continued ) 

•  Lei  A  be  the  sel  of  all  strings  of  a  s.  So  A  =  { a.  aa.  aaa,  aaaa,  aaaaa, . . . }.  Let 
S  be  ihe  set  thal  contains  all  subsets  SS  of  A  where  SS  contains  an  odd  number 
of  elements.  So  S  =  {{a},  {aa}.  {aaa},...,  {a.  aa.  aaa}.... }.  S  is  not  closed 
under  union, since,  for  example  {a}  U  {aa}  =  { a. aa}.  which  is  not  in  5. since 
it  contains  an  even  number  of  elements. The  closure  of  S  under  union  is  the  set 
of  all  nonempty  finite  sets  in  .J?(  A ). 


Given  a  set  S  and  a  property  P  we  may  want  to  compute  the  closure  of  S  with  respect 
to  P.  For  example,  we  will  often  want  to  compute  the  transitive  closure  of  a  binary  relation 
K  on  a  sel  /LThis  is  harder  than  it  seems.  We  can't  just  add  a  fixed  number  of  elements  to 
R  and  then  quit.  Every  time  we  add  a  new  element,  such  as  (.v.y),  we  have  to  look  to  see 
whether  there  is  some  element  (y.  r).  If  so,  we  also  have  to  add  (.v,  z).  And,  similarly,  we 
must  check  for  any  element  (w,  x)  that  would  force  us  to  add  (u\  y).  If  R  is  infinite, 
there  is  no  guarantee  that  this  process  will  ever  terminate.  Theorem  A.5  (presented  in 
Section  A.S)  guarantees  that  a  unique  closure  exists  but  it  does  not  guarantee  that  the 
closure  will  contain  a  finite  number  of  elements  and  thus  be  computable  in  a  finite 
amount  of  time. 

We  can.  however,  guarantee  that  the  transitive  closure  of  any  finite  binary  relation  is 
computable.  How?  A  very  simple  approach  is  the  following  algorithm  for  computing 
the  transitive  closure  of  a  binary  relation  R  with  n  elements  on  a  set  A.  If  /  is  an  ordered 
pair,  then  i.firsi  will  refer  to  the  first  element  of  the  pair  and  I. second  will  refer  to  the 
second  element. 

compitietrimsiti veclosurc(R:  relation)  = 

1.  trims  =  R.  I*  Initially  Irons  is  just  the  original  relation. 

/*  We  need  to  find  all  cases  where  (x.y)  and  (y,  z)  are  in  trims/ Then  we  must 
/*  insert  (x,z)  into  trims  if  it  isn’t  already  there 

2.  addedSomething  =  True.  I*  Keep  going  until  we  make  one  whole  pass. 

without  adding  any  new  elements  to  trans. 

3.  While  od  dal  Something  =  True  do: 

3.1.  added  Something  =  False. 

3.2.  For  each  element  tl  of  trims  do: 

For  each  clement  t2  of  irons  do:  I*  Compare  tl  to  every 

other  element  of  irons. 

If  / 1  .second  =  tl.first  then  do:  I*  We  have(.v.  y)  and  (v.  z). 

!f(/l  .first.  1 2. second)  *  irons  I*  We  have  to  add  (.r,  z). 

then  do: 

I  n  sc  rt  {mu  is,  ( tl.  first.  t2. second ) ). 
added  Something  =  True. 
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This  algorithm  is  straightforward  and  correcl,  hut  it  may  he  inefficient. There  are 
more  efficient  algorithms.  In  particular,  if  we  represent  a  relation  as  an  adjacency 
matrix,  we  can  do  better  using  Warshall's  algorithm  □  ,  which  finds  the  transitive  clo¬ 
sure  of  a  relation  over  a  set  of  n  elements  using  approximately  2/t3  bit  operations. 

In  Section  A.8,  we  present  a  more  general  definition  of  closure  that  includes,  as  spe¬ 
cial  cases,  the  two  specific  definitions  presented  here.  We  also  elaborate  on  some  of  the 
claims  that  we  have  just  made. 


Proof  Techniques 

In  this  section  we  summarize  the  most  important  proof  techniques  that  we  will  use  in 
the  rest  of  this  book. 


1  Proof  by  Construction 

Suppose  that  we  want  to  prove  an  assertion  of  the  form  3.v  (£(*))  or  Vjt(3y  (P(x,  y))). 
One  way  to  prove  such  a  claim  is  to  show  a  (provably  correct)  algorithm  that  finds  the 
value  that  we  claim  must  exist.  We  call  that  technique  proof  by  construction. 

For  example,  we  might  wish  to  prove  that  every  pair  of  integers  has  a  greatest  com¬ 
mon  divisor.  We  could  prove  that  claim  by  exhibiting  (as  we  do  in  Example  27.6)  a  cor¬ 
rect  greatest  common  divisor  algorithm.  In  exhibiting  such  an  algorithm,  we  show  not 
only  that  the  greatest  common  divisor  exists  (since  the  algorithm  provably  finds  one 
for  every  input  pair),  we  show  something  more— a  method  to  determine  the  greatest 
common  divisor  for  any  pair  of  integers.  While  this  is  a  stronger  claim  than  the  one  we 
started  with,  it  is  often  the  case  that  such  stronger  claims  are  easier  to  prove. 


.2  Proof  by  Contradiction 

Suppose  that  we  want  to  prove  some  assertion  P.  One  approach  is  to  assume,  to  the 
contrary,  that  ->P  were  true.  We  then  show,  with  that  assumption,  that  we  can  derive  a 
contradiction. The  law  of  the  excluded  middle  says  that  (P  V  ->P).  If  we  accept  it,  and 
we  shall,  then. since  ~>P  cannot  be  true,  P  must  be. 


EXAMPLE  A.12  There  is  an  Infinite  Number  of  Primes 

Consider  the  claim  that  there  is  an  infinite  number  of  prime  numbers.  Following  Eu¬ 
clid,  we  prove  this  claim  by  assuming,  to  the  contrary,  that  the  set  Pof  prime  numbers 
is  finite.  So  there  exists  some  value  of  n  such  that  P  =  {p,,  Pl,  p3, . . .  p„).  Let: 

<7  =  iP\P2Ps-"Pn)  +  1. 

Since  r/  is  greater  than  each  p(,  it  is  not  on  the  list  of  primes.  So  it  must  be  com¬ 
posite.  In  that  case,  it  must  have  at  least  one  prime  factor,  which  must  then  be  an 
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EXAMPLE  A.12  ( Continued ) 

element  of  P.  Suppose  that  factor  is  pk.  for  some  k  ^  /i.Then  q  must  have  at  least 
one  other  factor,  some  integer  i  such  that: 

q  =  *Pk- 
(PiPiPifPii)  +  1  =  ‘Pk- 
( P\P2P3--  Pn )  "  ‘Pk  =  "I- 

Now  observe  that  pk  divides  both  terms  on  the  left  since  it  is  prime  and  so  must 
be  in  the  set  {pj,  p2.  Ph  •  ••/>«}•  Factoring  it  out,  we  gel: 

Pk(P\PiPk  -  iPk  + 1  •  •  •  Ph  i)  ~  -1. 

Pk  -  ~V(P\PzPk  -  \Pk+\  '  P„-  »)• 

But, since  (p\Pipk -iPk*-i”  Pn  ~  i)  *s  an  integer,  this  means  that  |/?J  <  1.  But 
that  cannot  be  true  since  pk  is  prime  and  thus  greater  than  I.  So  q  is  not  compos¬ 
ite.  Since  q  is  greater  than  1  and  not  composite,  it  must  be  prime,  contradicting  the 
assumption  that  all  primes  are  in  the  set  {pt,  p2.  p$ _ p„\. 


Notice  that  this  proof,  in  addition  to  being  a  proof  by  contradiction,  is  constructive. 
It  exhibits  a  specific  example  that  contradicts  the  initial  assumption. 


EXAMPLE  A.13  is  Irrational 

Consider  the  claim  that  V2  is  irrational.  We  prove  this  claim  by  assuming,  to  the 
contrary,  that  V2  is  rational.  In  that  case,  it  is  the  quotient  of  two  integers,  i  and  j. 
So  we  have: 

Vl  =  ilj. 

If  i  and  j  have  any  common  factors,  then  reduce  them  by  those  factors.  Now  we  have: 
V2  =  kin.  where  k  and  n  have  no  common  factors. 

2  =  *  V. 

2/r  =  k2. 

Since  2  is  a  factor  of  k2,  k1  must  be  even  and  so  k  is  even.  Since  k  is  even,  we 
can  rewrite  it  as  2 in  for  some  integer  m.  Substituting  2m  for  k.  we  get: 

2/r  =  (2m)2. 

In1  =  Anr. 
n2  =  2m". 

So  n 2  is  even  and  thus  n  is  even.  But  now  both  k  and  n  arc  even  and  so  have  2  as  a 
common  factor.  But  we  had  reduced  them  until  they  had  no  common  factors. The  as¬ 
sumption  that  Vi  is  rational  has  led  to  a  contradiction.  So  V2  cannot  be  rational. 
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A-6.3  Proof  by  Counterexample 

Consider  any  claim  of  the  form  V.r  ( P(x )).  Such  a  claim  is  false  iff  3x  P(x )).  We  can 
prove  that  it  is  false  by  finding  such  an  x. 


EXAMPLE  A.14  Mersenne  Primes 

Let  M  be  the  set  of  numbers  of  the  form  2”  -  1  for  some  positive  integer  n.  M  is 
also  called  the  set  of  Mersenne  numbers  H.  Now  consider  only  those  cases  in 
which  n  is  prime.  (In  fact,  some  authors  restrict  the  term  Mersenne  number  only 
to  those  cases.)  Consider  two  statements: 

1.  If  n  is  prime,  then  2"  -  1  is  prime. 

2.  If  2"  -  1  is  prime,  then  n  is  prime. 

Statement  2  is  true  H.  But  what  about  statement  1?  Hundreds  of  years  ago, 
some  mathematicians  believed  that  it  was  true,  although  they  had  no  proof  of  it. 
Then,  in  1536,  Hudalricus  Regius  refuted  the  claim  by  showing  a  counterexample: 
211  —  1  =  2047  is  not  prime.  But  that  was  not  the  end  of  false  conjectures  about 
these  numbers. The  elements  of  M  that  are  also  prime  are  called  Mersenne  primes 
S,  after  the  monk  Marin  Mersenne,  who,  in  1644,  made  the  claim  that  numbers  of 
the  form  2”  -  1  are  prime  if  n  =  2,  3, 5, 7, 13, 17, 19, 31, 67, 127,  and  257,  but  are 
composite  for  all  other  positive  integers  n  s  257.  Mersenne’s  claim  was  shown  to 
be  false  by  counterexample,  over  two  hundred  years  later,  when  it  was  discovered 
that  26‘  -  1  is  also  prime.  Later  discoveries  showed  other  ways  in  which 
Mersenne  was  wrong.  The  correct  list  of  values  of  n  s  257  such  that  2"  -  1  is 
prime  is  2, 3,5, 7, 13, 17, 19. 31. 61,89. 107,  and  127. 


EXAMPLE  A.15  All  it  Takes  is  One  Counterexample 
Consider  the  following  claim: 

Let  A,  B,  and  C  be  any  sets.  If  A  -  C  =  A  -  B  then  B  =  C. 

We  show  that  this  claim  is  false  with  a  counterexample:  Let  A  =  0,  B  =  {1}, 
and  C  =  {2}.  A-C  =  A  - B  =  0.  But  B  *  C. 


6  4  proof  by  Case  Enumeration 

A*°' 

Consider  a  claim  of  the  form  V.v  e  A  (P{x)).  Sometimes  the  most  straightforward  way 
to  prove  that  P  holds  for  all  elements  of  A  is  to  divide  A  into  two  or  more  subsets  and 
then  to  prove  P  separately  for  each  subset. 
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EXAMPLE  A.16  The  Postage  Stamp  Problem 

Suppose  that  the  postage  required  to  mail  a  letter  is  always  at  least  6c.  Prove  that 
it  is  possible  to  apply  any  required  postage  to  a  letter  given  only  2c  and  7c  stamps. 

We  prove  this  general  claim  by  dividing  it  into  two  cases,  based  on  the  value  of 
n,  the  required  postage: 

1.  If  n  is  even  (and  6c  or  more),  apply  nil  2c  stamps 

2.  If  n  is  odd  (and  6c  or  more),  then  n  s  7  and  n  —  7  s  0  and  is  even.  7c  can 
be  applied  with  one  7c  stamp.  Apply  one  7c  stamp  and  («  -  1)12  2c  stamps. 


A.6.5  Mathematical  Induction 

The  principle  of  mathematical  induction  slates: 

If:  P(l>)  is  true  for  some  integer  base  case  h.  and 

for  all  integers  n  2r  h .  P{n)  — *  P(n  +  1 ) 

Then  for  all  integers  //  ^  b.  P(n) 

A  proof,  using  mathematical  induction,  of  an  assertion  P  about  some  set  of  positive 
integers  greater  than  or  equal  to  some  specific  value  />,  has  three  parts. 

1.  A  clear  statement  of  the  assertion  P. 

"L  A  proof  that  that  P  holds  for  some  base  case  b.  the  smallest  value  with  which  we 
are  concerned.  Often,  b  =  0  or  1.  but  sometimes  P  may  hold  only  once  we  get 
past  some  initial  unusual  cases. 

3.  A  proof  that,  for  all  integers  n  ^  />.  if  P(n)  then  it  is  also  true  that  P(n  +  1). 
We'll  call  the  claim  P(n)  the  induction  hypothesis. 


EXAMPLE  A.17  The  Sum  of  the  First  n  Odd  Positive  Integers  is  n2 

Consider  the  claim  that  that  the  sum  of  the  first  n  odd  positive  integers  is  n2.  We 
first  check  for  plausibility: 

(//  =  1)1  =  1  =  l2. 

(/I  =  2)  1  +  3  =  4  =  22. 

(n  =  3)  1  +  3  +  5  =  9  =  3\ 

(/>  =  4)  1  +  3  +  5  +  7  =  16  =  4:.  and  so  forth. 

The  claim  appears  to  be  true,  so  we  should  prove  it.  Let  Odd,  =  2(i  —  1)  +  l 
denote  the  i'h  odd  positive  integer. Then  we  can  rewrite  the  claim  as: 
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The  proof  of  the  claim  is  by  induction  on  n: 

•  Base  case:  Take  1  as  the  base  case.  1  =  l2. 

•  Prove:  V/i  ^  \^^Odd,  =  n2^  — *  ^ ^Oddj  =  (ft  +  l)2 

Observe  that  the  sum  of  the  first  n  +  1  odd  integers  is  the  sum  of  the  first  n  of 
them  plus  the  n  +  Is1,  so: 

yorfrfi  =  +  Oddn  +  l 

f=l  1=1 

=  n2  +  Oddn  +  \.  (Using  the  induction  hypothesis.) 

=  n2  +  2/i  +  1.  (Since  Od(l„  +  \  is  2(/i  +  1  —  1)  +  1  =  2#i  +  I.) 

=  (M  +  l)2. 


Mathematical  induction  lets  us  prove  properties  of  positive  integers.  But  it  also  lets 
us  prove  properties  of  other  things  if  the  properties  can  be  described  in  terms  of  inte¬ 
gers.  For  example,  we  could  talk  about  the  cardinality  of  a  finite  set,  or  the  length  of  a 
finite  string. 


EXAMPLE  A.18  The  Cardinality  of  the  Power  Set  of  a  Finite  Set 

Let  A  be  any  finite  set.  We  prove  the  following  claim  about  the  cardinality  of  the 
power  set  of/4: 

|9>(A)|  =  2U1. 

The  proof  is  by  induction  on  1>4|,  the  cardinality  of  A. 

•  Base  case: Take  0  as  the  base  case.  U|  =  0,  A  =  0,  and  9(A)  =  {01,  whose 
cardinality  is  1=2°  =  2W. 

•  Prove:  V/i  ^  0((|£P(>4)|  =  2^  for  all  sets  A  of  cardinality  n)  — ►  (|9P(>4.)1 
=  2wl  for  all  sets  A  of  cardinality  n  +  1 )). 

We  do  this  as  follows.  Consider  any  value  of  //  &  0  and  any  set  A  with  n  +  1 
elements.  Since  /i  ^  0,  A  must  have  at  least  one  element.  Pick  one  and  call  it  a. 
Now  consider  the  set  B  that  we  gel  by  removing  a  from  A.  |  B\  must  be  n.  So,  by  the 
induction  hypothesis,  \<9(B)\  =  2lfll.  Now  return  to  9(A). 

It  has  two  parts:  those  subsets  of  A  that  include  a  and  those  that  don't. The  second 
part  is  exactly  9(B),  so  we  know  that  it  has  2,fl1  =  2"  elements.  The  first  part  (all 
the  subsets  that  include  a)  is  exactly  all  the  subsets  that  don't  include  a  with  a 
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EXAMPLE  A.18  ( Continued ) 

added  in).  Since  there  are  2"  subsets  that  don’t  include  a  and  there  are  the  same 
number  of  them  once  we  add  a  to  each,  we  have  that  the  total  number  of  subsets  of 
our  original  set  A  is  2n  (for  the  ones  that  don’t  include  a)  plus  another  2"  (for  the 
ones  that  do  include  a),  for  a  total  of  2(2")  =  2'”  which  is  exactly  2W|. 


Mathematical  induction  can  be  used  to  prove  properties  of  a  linear  sequence  of  ob 
jeets  by  assigning  to  each  object  its  index  in  the  sequence. 


EXAMPLE  A.19  Generalized  Modus  Tolens 

Recall  the  inference  rule  we  call  modus  tollens:  From  (P  — *  (?)  and  ~'Q,  conclude 
-i P.  We  can  use  mathematical  induction  to  prove  a  generalization  of  modus  tollens 
to  an  arbitrary  chain  of  implications.  Suppose  that  we  know,  for  any  value  of 
n  ^  2,  two  things: 

Vr,  where  1  ^  /  <  n  -  1,  (P  — *  P>+  |)  I*  In  a  chain  of  n  propositions. 

/*  each  implies  the  next. 

->P„  I*  The  last  proposition  is  known 

I*  to  be  false. 

Then  generalized  modus  tollens  will  let  us  conclude  that  all  the  preceding 
propositions  are  also  false,  and  so,  in  particular.il  must  be  the  case  that: 

We  can  use  induction  to  prove  this  rule. To  make  it  easy  to  describe  the  rule  as 
we  work,  we’ll  introduce  the  notation  P[Q  to  mean  that,  from  P,  we  can  derive 
Q.  Using  this  notation,  we  can  state  concisely  the  rule  we  are  trying  to  prove. 

V/i  s  2  (((Vi  <  n  (Pj  *  P '»  +  ,))  A  -,/>„)  M>.) 

The  proof  is  by  induction  on  n ,  the  number  of  propositions. 

•  Base  case:  Take  2  as  the  base  case.  We  have  P\  —*  P2  and  -i P2.  So.  using  modus 
tolens,  we  conclude  ->P,. 

•  Prove  that  if  the  claim  is  true  for  n  propositions  it  must  be  true  for  n  +  1  of 
them: 


(((V/  <  P,+  1))  A-.P,,)  hP|)-*(((Vi  <  n  +  1  (/^— >Pm))  A 

-'P M+l)  \->P |) 
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((/*„—*/*«+!)  A^Pn+))  \-iPn 
((V,  <  n  (/*,—*  /*,  +  >))  A  -iP „)  1 — >/* i 
((Vi  <  h(P,-Pi  +  i))A(P,^P.+  i) 

A  -'P„4.i)  1 - 

((Vi  <U  +  1  (P,“*  P i+i))  A  -i P /l+.j)  l--.Pi 


(Modus  tollens) 
(Induction  hypothesis) 

(Chaining) 

(Simplification) 


Mathematical  induction  relies  on  the  fact  that  any  subset  of  the  nonnegative  inte¬ 
gers  forms  a  well-ordered  set  (as  defined  in  Section  A.3.5)  under  the  relation  Once 
we  have  done  an  induction  proof,  we  know  that  A(b)  (where  b  is  typically  0  or  l.but  it 
could  be  some  other  starting  value)  is  true  and  we  know  that  V/i  >  b(A(n)^* 
(A(n  +  1 )).  Then  we  claim  that  Vn  ^  b  (/1(/j)).  Suppose  that  the  principle  of  mathe¬ 
matical  induction  were  not  sound  and  there  existed  some  set  S  of  nonnegative  integers 
>b  for  which  A(ii)  is  false.  Then,  since  S  is  well-ordered,  it  has  a  least  element,  which 
we  can  call  .v.  By  definition  of  S,  x  must  be  equal  to  or  greater  than  b.  But  it  cannot  ac¬ 
tually  be  b  because  wc  proved  A(b).  So  it  must  be  greater  than  b.  Now  consider  jc  —  1. 
Since  x  -  1  is  less  than  x,  it  cannot  be  in  S  (since  we  chose  x  to  be  the  smallest  value  in 
S).  If  x  —  I  is  not  in  S,  then  we  know  A(.v  -  1).  But  we  proved  that 
V/i  ^  ()  (A(n)~*  A(n  +  1)),  so  A(x  -  1)— *-  /t(.v).  But  we  assumed ->A (x).  So  that  as¬ 
sumption  led  us  to  a  contradiction  and  thus  must  be  false. 

Sometimes  the  principle  of  mathematical  induction  is  stated  in  a  slightly  different 
but  formally  equivalent  way: 

If:  A[b)  is  true  for  some  integer  value  b,  and 

for  all  integers  n  ^  b  (( A(k )  is  true  for  all  integers  k  where 
b  ^  k  ^  77)— */\(/7  -I-  I)), 

Then:  for  all  integers  n  >  b  ( A(x)). 

This  form  of  mathematical  induction  is  sometimes  called  strong  induction. To  use  it, 
we  prove  that  whenever  A  holds  lor  all  nonnegative  integers  starting  with  b .  up  to  and 
including  n.  it  must  also  hold  for  n  +  l .  We  can  use  whichever  form  of  the  technique  is 
easiest  for  a  particular  problem. 


.6  The  Pigeonhole  Principle 

Suppose  that  we  have  n  pigeons  and  k  holes.  Each  pigeon  must  fly  into  a  hole.  If  n  >  k, 
then  there  must  be  at  least  one  hole  that  contains  more  than  one  pigeon.  We  call  this 
obvious  observation  the  pigeonhole  principle.  More  formally,  consider  any  function 
f:  A—*  The  pigeonhole  principle  says: 

If  Ml  >  |fi|  then  /  is  not  one-to-one. 

The  pigeonhole  principle  is  a  useful  technique  for  proving  relationships  between 
sets.  For  example,  suppose  that  set  A  is  the  set  of  all  students  who  live  in  the  dorm.  Set 
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B  is  the  set  of  rooms  in  the  dorm. The  function  / maps  each  student  to  a  dorm  room.  So, 
if  \a\  >  |fi|,  we  can  use  the  pigeonhole  principle  to  show  that  some  students  have 
roommates.  As  another  everyday  use  of  the  principle,  consider:  If  there  are  more  than 
3W>  people  in  a  class,  then  at  least  two  of  them  must  share  a  birthday.  The  pigeonhole 
principle  is  also  useful  in  proving  less  obvious  claims. 


EXAMPLE  A.20  The  Coins  and  Balance  Problem 

Consider  the  following  problem:  You  have  three  coins.  You  know  that  two  are  of 
equal  weight;  the  third  is  different.  You  do  not  know  which  coin  is  different  and 
you  do  not  know  whether  it  is  heavier  or  lighter  than  the  other  two.  Your  task  is  to 
identify  the  different  coin  and  to  say  whether  it  is  heavier  or  lighter  than  the  oth¬ 
ers.  The  only  tool  you  have  is  a  balance,  with  two  pans,  onto  which  you  may  place 
one  or  more  objects.  The  balance  has  three  possible  outputs:  left  pan  heavier  than 
right  pan,  right  pan  heavier  than  left  pan,  both  pans  the  same  weight.  Show  that 
you  cannot  solve  this  problem  in  a  single  weighing. 

There  are  six  possible  situations:  There  are  three  coins,  any  one  of  which  could 
be  different,  and  the  different  coin  can  be  either  heavier  or  lighter.  But  a  single 
weighing  (no  matter  how  you  choose  to  place  coins  on  pans)  has  only  three  possi¬ 
ble  outcomes.  So  there  is  at  least  one  outcome  that  corresponds  to  at  least  two  sit¬ 
uations.  Thus  one  weighing  cannot  be  guaranteed  to  determine  the  situation 
uniquely. 


A.6.7  Showing  That  Two  Sets  Are  Equal 

A  great  deal  of  what  we  do  when  we  build  a  theory  about  some  domain  is  to  prove  that 
various  sets  of  objects  in  that  domain  are  equal.  For  example,  in  our  study  of  automata 
theory,  we  are  going  to  want  to  prove  assertions  such  as  the  following. 

•  The  set  of  strings  defined  by  some  regular  expression  «  is  identical  to  the  set  of 
strings  defined  by  some  second  regular  expression  /3. 

•  The  set  of  strings  that  will  be  accepted  by  some  given  finite  state  machine  M  is  the 
same  as  the  set  of  strings  that  will  be  accepted  by  some  new  finite  state  machine  M' 
that  has  fewer  states  than  M  has. 

•  The  set  of  languages  that  can  be  defined  using  regular  expressions  is  the  same  as 
the  set  of  languages  that  can  be  accepted  by  a  finite  stale  machine. 

•  The  set  of  problems  that  can  be  solved  by  a  Turing  Machine  with  a  single  tape  is  the 
same  as  the  set  of  problems  that  can  be  solved  by  a  Turing  Machine  with  any  finite 
number  of  tapes. 

So  we  become  very  interested  in  the  question,  “How  does  one  prove  that  two 
sets  are  identical?"  There  are  lots  of  ways  and  many  of  them  require  special  tech¬ 
niques  that  apply  in  specific  domains.  But  it  is  worth  mentioning  two  very  general 
approaches  here. 
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Sometimes  we  want  to  compare  apples  to  apples.  We  may.  for  example,  want  to 
prove  that  two  sets  of  strings  are  identical,  even  though  they  may  have  been  derived 
differently.  In  this  case,  one  approach  is  to  use  the  set  identity  theorems  that  we  have 
already  described.  Suppose,  for  example,  that  we  want  to  prove  that: 

AU{BC\(AnC))  =  A. 


We  can  prove  this  as  follows: 

/iu(sn(/tnc))  =  (AUB)n(AU(Anc)). 

=  (/tu#)n((/inc)LM). 
=  (/!UB)nA 
=  A. 


(Dislributivity) 

(Commutativity) 

(Absorption) 

(Absorption) 


Sometimes,  even  when  we’re  comparing  apples  to  apples,  the  theorems  we  have  list¬ 
ed  are  not  enough.  In  these  cases,  we  need  to  use  the  definitions  of  the  operators.  Sup¬ 
pose,  for  example,  that  we  want  to  prove  that: 

A  —  B  =  A  n  iB. 


We  can  prove  this  as  follows  (where  U  stands  for  the  universe  with  respect  to  which 
we  take  complement): 

A  —  B  =  {x :  xe  A  and  xe  0}. 

=  {.v  :  x  e  A  and  (x  e  U  and  x  e  /?)}. 

=  {x:xe  A  and  xel/  -  B}. 

=  {x:  re  A  and  xe--B}. 

=  a  n  —>b. 

Sometimes,  though,  our  problem  is  more  complex.  We  may  need  to  compare  apples 
to  oranges.  In  other  words,  we  may  need  to  compare  sets  that  aren't  even  defined  in  the 
same  terms.  For  example,  we  will  want  to  be  able  to  prove  that  A:  {the  set  of  languages 
that  can  be  defined  using  regular  expressions}  is  the  same  as  B\  {the  set  of  languages 
that  can  be  accepted  by  a  finite  state  machine}.  This  seems  very  hard:  Regular  expres¬ 
sions,  which  we  describe  in  Chapter  6,  are  strings  that  look  like: 

a*(b  U  ba)* 

Finite  state  machines,  which  we  describe  in  Chapter  5,  are  collections  of  states  and 
rules  for  moving  from  one  state  to  another.  How  can  we  possibly  prove  that  A  (defined 
in  terms  of  regular  expressions)  and  B  (defined  in  terms  of  finite  state  machines)  are  the 
same  set?  The  answer  is  that  we  can  show  that  any  two  sets  are  equal  by  showing  that 
each  is  a  subset  of  the  other.  So,  to  prove  that  A  =  B.  we  will  show  first  that,  given  a  reg¬ 
ular  expression,  we  can  construct  a  finite  slate  machine  that  accepts  exactly  the  strings 
that  the  regular  expression  describes.  That  gives  us  A  C  B.  But  there  might  still  be  some 
finite  state  machines  that  don  t  correspond  to  any  regular  expressions.  So  we  then  show 
that,  given  a  finite  state  machine,  we  can  construct  a  regular  expression  that  defines  ex¬ 
actly  the  same  strings  that  the  machine  accepts.  That  gives  us  BQA.  In  Section  6.2.  we 
describe  both  of  these  proofs  and  use  them  to  prove  the  claim,  called  Kleene’s  Theorem, 
that  A  -  B.  We  will  use  the  same  technique  several  more  times  throughout  the  book. 
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A.6.8  Showing  That  a  Set  is  Finite  or  Countably  Infinite 

Next.  Id's  return  briefly  to  the  question.  “What  is  the  cardinality  of  a  set?'*  In  this 
book,  we  will  be  concerned  with  three  cases: 

•  finite  sets, 

•  countably  infinite  sets,  and 

•  uncouniablv  infinite  sets. 

We  will  use  the  following  definitions  for  the  terms  •‘finite’’ and  “infinite”:14 

A  set  A  is  finite  and  has  cardinality  //  e  N  (the  natural  numbers)  iff  either  A  =  0 
or  there  is  a  hijection  from  { 1, 2, . . . n)  to  A.  for  some  value  of  n.  Alternatively. a 
set  is  finite  if  we  can  count  its  elements  and  finish. The  cardinality  of  a  finite  set  is 
simply  a  natural  number  whose  value  is  the  number  of  elements  in  the  set. 

A  set  is  infinite  iff  it  is  not  finite. The  first  infinite  set  we'll  consider  is  N.  the  natu¬ 
ral  numbers.  Following  Cantor,  we'll  call  the  cardinality  of  R„.  (Read  this  as 
“aleph  null”.  Aleph  is  the  first  symbol  of  the  Hebrew  alphabet.) 

Now  consider  an  arbitrary  set  A.  We'll  say  that  A  is  countably  infinite  and  also  has 
cardinality  K„  iff  there  exists  some  bijection  f:N~*  A.  And  we  need  one  more  defini¬ 
tion:  A  set  is  countable  iff  it  is  either  finite  or  countably  infinite.  We  use  the  term 
“countable”  because  the  elements  of  a  countable  set  can  be  counted  with  the  integers. 
To  prove  that  a  set  A  is  countably  infinite,  it  suffices  to  find  a  bijection  from  to  it. 

EXAMPLE  A.21  There  is  a  Countably  Infinite  Number  of  Even 
Numbers 

The  set  £of  even  natural  numbers  is  countably  infinite. To  prove  this,  we  offer  the 
bijection: 

Even:  N  —*  E. 

Even(x)  =  2.r. 

So  we  have  the  following  mapping  from  N  to  E: 


N 

E 

0 

0 

1 

2 

2 

4 

3 

6 

•  •  • 

... 

,JAn  alternative  is  to  begin  by  saying  that  a  set  A  is  infinite  ill  there  exists  a  one-to-one  mapping  from  A  into 
a  proper  subset  of  itself. Then  u  set  is  finite  iff  it  is  not  infinite.  With  the  axiom  of  choice,  these  two  dcfiniions 
arc  equivalent. 
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The  Iasi  example  was  easy.  The  bijeciion  was  obvious.  Sometimes  it  is  less  so.  In 
harder  cases,  a  good  way  to  think  about  the  problem  of  finding  a  bijeciion  from  to 
some  set  A.  is  to  turn  it  into  the  problem  of  finding  an  enumeration  of  A. 

An  enumeration  of  a  set  A  is  simply  a  list  of  the  elements  of  A  in  some  order. 
Each  element  of  A  must  occur  in  the  enumeration  exactly  once.  Of  course,  if  A  is 
infinite,  the  enumeration  will  be  infinite.  But  as  long  as  we  can  guarantee  that 
every  element  of  A  will  show  up  eventually,  we  have  an  enumeration. 


THEOREM  A.1  Infinite  Enumeration  and  Countable  Infinity 

Theorem:  A  set  A  is  countably  infinite  iff  there  exists  an  infinite  enumeration  of  it. 

Proof:  We  prove  the  if  and  only-if  parts  separately. 

If  A  is  countably  infinite,  then  there  exists  an  infinite  enumeration  of  it:  Since 
A  is  countably  infinite,  there  exists  a  bijeciion  / from  M  to  it.  We  construct  an  infinite 
enumeration  of  A  as  follows  (where  the  only  slight  issue  is  that  we  number  the  ele¬ 
ments  of  an  enumeration  starting  with  1  and  the  natural  numbers  start  with  0):  For 
all  i  ^  1,  the  i'u  element  of  the  enumeration  of  A  will  be  /(/  -  1).  So  the  first  ele¬ 
ment  of  the  enumeration  will  be  the  element  that  0  maps  to,  the  second  element  of 
the  enumeration  will  be  the  element  that  1  maps  to.  and  so  forth. 

If  there  exists  an  infinite  enumeration  E  of  A  ,  then  A  is  countably  infinite. 
Define  f:N—*  A,  where  /(*)  is  the  (i  +  1  )M  element  of  the  list  E.The  function  /is 
a  bijeciion  from  to  A,  so  A  is  countably  infinite. 

We  can  use  Theorem  A.l  both  to  show  that  a  set  is  countably  infinite  (by  exhibiting 
an  infinite  enumeration  of  it)  and  to  show  that  a  set  is  not  countably  infinite  (by  show¬ 
ing  that  no  infinite  enumeration  of  it  exists). 

THEOREM  A.2  Finite  Union 

Theorem:  The  union  U  of  a  finite  number  of  countably  infinite  sets  is  countably 
infinite. 

Proof:  The  proof  is  by  enumeration  of  the  elements  of  U.  We  need  a  technique  for 
producing  that  enumeration.  The  simplest  thing  to  do  would  be  to  start  by 
enumerating  all  the  elements  of  the  first  set,  then  all  the  elements  of  the  second, 
etc.  But.  since  the  first  set  is  infinite,  we  will  never  get  around  to  considering  any  of 
the  elements  of  the  other  sets.  We  need  another  technique.  We  take  the  first 
clement  Irom  each  of  the  sets,  then  the  second  element  from  each,  and  so  forth, 
checking  he  I  ore  inserting  each  element  to  make  sure  that  it  is  not  already  there. 

Using  a  technique  similar  to  the  one  we  just  used  to  prove  Theorem  A.2.  it  is  easy 
to  show  that  for  any  fixed  n,  the  set  of  ordered  n-tuples  of  elements  drawn  from  a 
countably  infinite  set  must  also  be  countably  infinite.  So.  for  example,  the  rational 
numbers  are  countably  infinite. 
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FIGURE  A.9  Systematically  enumerating  the  elements  of  an  infinite  number  of 
infinite  sets. 


THEOREM  A.3  Countably  Infinite  Union 

Theorem:  The  union  U  of  a  countably  infinite  number  of  countably  infinite  sets  is 
countably  infinite. 

Proof:  The  proof  is  by  enumeration  of  the  elements  of  U.  Now  we  cannot  use  the 
simple  enumeration  technique  that  we  used  in  the  proof  of  Theorem  A.2.  Since  we 
are  now  considering  an  infinite  number  of  sets,  if  we  tried  that  technique  we'd  never 
get  to  the  second  clement  of  any  of  the  sets.  So  we  follow  the  arrows  as  shown  in 
Figure  A.9.  The  numbers  in  the  squares  indicate  the  order  in  which  we  select 
elements  for  the  enumeralion.This  process  goes  on  forever,  but  it  is  systematic  and  it 
guarantees  that,  if  we  wait  long  enough,  any  element  of  any  of  the  sets  will 
eventually  be  enumerated.  Note  that,  before  we  actually  enter  any  element  into  the 
enumeration,  we  must  check  to  make  sure  that  it  has  not  already  been  generated. 

It  turns  out  that  there  are  a  lot  of  countably  infinite  sets.  Some  of  them,  like  the  even 
natural  numbers,  appear  at  first  to  contain  fewer  elements  than  ^  does.  Some  of  them,  like 
the  union  of  a  countable  number  of  countable  sets,  appear  at  first  to  be  bigger.  But  in  both 
cases  there  is  a  bijeelion  from  to  the  elements  of  the  set.  so  the  cardinality  of  the  set  is  Ro- 

A.6.9  Showing  That  a  Set  is  Uncountably  Infinite:  Diagonalization 

But  not  all  infinite  sets  are  countably  infinite. There  are  sets  with  more  than  R„  elements. 
There  are  more  than  Rt,  real  numbers,  for  example.  As  another  case, consider  an  arbitrary 
set  S  with  cardinality  R„.  Now  consider  '.'/‘(.S')  (the  power  set  of  .S').  ./‘(.S')  has  cardinality 
greater  than  R„.  To  prove  this,  we  need  to  show  that. although  ./MS)  is  infinite,  there  exists 
no  bijeelion  from  M  to  it.To  do  this,  we  will  use  a  technique  called  diagonalization. 

Diagonalization  is  a  kind  of  proof  by  contradiction. To  show  that  a  set  A  is  not  count¬ 
ably  infinite,  we  assume  that  it  is.  in  which  case  there  would  be  some  enumeration  of  it. 
Every  element  of  A  would  have  to  be  on  that  list  somewhere.  But  we  show  how  to  con¬ 
struct  an  element  of  A  that  cannot  be  on  the  list,  no  matter  how  the  list  was  constructed. 
Thus  there  exists  no  enumeration  of  A.  So  A  is  not  countably  infinite. 


THEOREM  A.4  The  Cardinality  of  the  Power  Set 

Theorem:  If  S  is  a  countably  infinite  set.  ./’(.S')  (the  power  set  of  S’)  i<>  infinite  but  not 
countably  infinite. 

Proof:  ./MS)  must  be  infinite  because,  for  each  of  the  infinitely  many  elements  .y  of  S, 
the  set  {.v}  is  an  element  of  //‘(S). 


A.6  Proof  Techniques  791 


[ 

Elem  1  of  S 

Elem  2  of  .S’ 

lamEEMl 

Elem  4  of  S 

Elem  5  of  S 

Elem  6  of  5 1 

t 

1 

_ 

1 

1 

mmm\ 

(a) 

lilcm  l  of  S 

Elem  2  of  S 

Elem  3  of  S 

Elem  4  of  5 

Elem  5  of  S 

Elem  6  of  S 

Elem  l  of  /■(-V'l 

Clem  2  of 

UH1 

I.- 1  ■ 

■ 

■ 

■ 

Elem  3  of  .f'(.S') 

1 

i 

_ (3)_ 

1 

(4) 

nmasm. ran 

1 

1 

(5) 

Elem  ft  of  J'{S) 

l 

IUH 

I^Sft 

> 

1 _  1 

1_ 

J _ 

i 

1 — J 

(b) 


4D 


-.(3)  U(4)  U(5) 


(c) 


FIGURE  A.10  Using  diagonalization  to  show  the  uncounlahility  of  a  power  set. 


But  now  we  must  prove  that  3>(S)  is  not  countably  infinite.  The  proof  is  by 
diagonalization.  Since  S  is  countably  infinite,  by  Theorem  A.l,  there  exists  an 
infinite  enumeration  of  it.  We  can  use  that  enumeration  to  construct  a  repre¬ 
sentation  of  each  subset  SS  of  S  as  an  infinite  binary  vector  that  contains  one 
element  for  each  element  of  the  original  set  S.  If  SS  contains  element  1  of  S, 
then  the  first  element  of  its  vector  will  be  1,  otherwise  0  (which  we’ll  show  as 
blank  to  make  our  tables  easy  to  read).  Similarly  for  all  the  other  elements  of  S. 
Of  course,  since  S  is  countably  infinite,  the  length  of  each  vector  will  also  be 
countably  infinite.  Thus  we  might  represent  a  particular  subset  SS  of  5  as  the 
infinite  vector  shown  in  Figure  A.lO(a). 

Now.  assume  that  .^*(5)  is  countably  infinite. Then  there  is  some  enumeration  of 
it.  Pick  any  such  enumeration,  and  write  it  as  shown  in  Figure  A.10(b)  (where  each 
row  represents  one  element  of  .?(S)  as  described  above.  Ignore  for  the  moment  the 
numbers  enclosed  in  parentheses.)  This  table  is  infinite  in  both  directions.  Since  it  is 
an  enumeration  of  #(S),  it  must  contain  one  row  for  each  element  of  ^(S).  But  it 
doesn't. To  prove  that  it  doesn’t,  we  will  construct  L,  an  element  of  &(S)  that  is  not 
on  the  list.  To  do  this,  consider  the  numbers  in  parentheses  along  the  diagonal  of 
the  matrix  of  Figure  A.  10(b).  Using  them,  we  can  construct  L  so  that  it  corresponds 
to  the  vector  shown  in  Figure  A.  10(c).  What  we  mean  by  -n(l)  is  that  if  the  square 
labeled  ( 1 )  is  a  1  then  0;  if  the  square  labeled  ( 1 )  is  a  0.  then  1 . 

So  we  ve  constructed  the  representation  for  an  element  of  3>(S).  It  must  be  an 
element  of  .JP( S )  since  it  describes  a  possible  subset  of  S.  But  we’ve  built  it  so  that 
it  diflers  from  the  first  element  in  the  list  of  Figure  A.10(b)  by  whether  or  not  it 
includes  element  1  of  S.  It  differs  from  the  second  element  in  the  list  by  whether  or 
not  it  »nc  u  es  element  2  of  S.  And  so  forth.  In  the  end,  it  must  differ  from  every 
clement  in  the  list  in  at  least  one  place.  Yet  it  represents  an  element  of  9»(A’).Thus 
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we  have  a  contradiction.  The  list  was  not  an  enumeration  of  .'/'(.S').  But  since  we 
made  no  assumptions  about  it  except  that  it  was  an  enumeration  of  ./‘(.S'),  no  such 
enumeration  can  exist.  In  particular,  if  we  try  to  fix  the  problem  by  simply  adding 
our  new  element  L  to  the  list,  wc  can  just  turn  around  and  do  the  same  thing  again 
and  create  yet  another  element  that  is  not  on  the  list. Titus  there  are  more  than  Mo 
elements  in  .'/'(.S’). 


If  a  set  5  is  infinite  but  not  countably  infinite  then  we  will  say  that  it  is  uncountably 
infinite.  So.  for  example,  ./‘(fy)  is  uncountably  infinite,  since,  by  Theorem  A.4,  the  power 
set  of  any  countably  infinite  set  is  infinite  but  not  countably  infinite. The  real  numbers 
are  uncountably  infinite,  which  can  be  shown  with  a  proof  that  is  very  similar  to  the  one 
we  just  did  for  the  power  set  except  that  it  is  a  bit  tricky  because,  when  we  write  out 
each  numher  as  an  infinite  sequence  of  digits  (just  as  wc  wrote  out  each  set  above  as  an 
infinite  sequence  of  0's  and  I’s).  we  have  to  consider  the  fact  that  several  distinct  se¬ 
quences  may  represent  the  same  number. 

Not  all  uncountably  infinite  sets  have  the  same  cardinality. There  are  more  elements 
in  the  power  set  of  the  real  numbers  than  there  are  real  numbers,  for  example. 


A.7  Reasoning  about  Programs 

An  algorithm  is  a  detailed  procedure  that  accomplishes  some  clearly  specified  task.  A 
program  is  an  executable  encoding  of  an  algorithm.  Not  all  algorithms  halt.  For  exam¬ 
ple.  a  monitoring  system  might  be  designed  never  to  halt  but  to  run  constantly,  looking 
for  some  pattern  of  events  to  which  some  sort  of  response  is  required.  So  not  all  pro¬ 
grams  are  designed  to  halt.  However,  we  will  focus  on  the  class  of  programs  w  hose  job 
is  to  accept  input,  compute,  and  hail,  having  produced  appropriate  output.  Useful  pro¬ 
grams  of  this  sort  possess  two  kinds  of  properties: 

1.  Correctness  properties,  including: 

•  the  program  eventually  halls. and 

•  when  it  halts,  it  has  produced  the  desired  output. 

2.  Performance  properties,  including: 

•  time  requirements,  and 

•  space  requirements. 

Entire  books  have  heen  written  on  techniques  for  proving  these  properties.  We 
summarize  here  just  the  few  techniques  that  we  will  find  the  most  useful  in  the  rest  of 
this  book. 


A.7.1  Proving  Correctness  Properties 

Wc  will  first  consider  the  problem  of  proving  that  a  program  halls.  Then  we’ll  look  at 
techniques  that  can  be  used  to  show  that  a  program's  result  satisfies  its  specification. 
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Proving  that  a  Program  Halts 

When  we  describe  a  program  to  solve  a  problem,  we  would  like  to  be  able  to  prove  that 
the  program  always  halls.  One  of  the  main  results  of  the  theory  that  we  will  develop  in 
this  book  is  that  there  can  exist  no  algorithm  to  solve  the  halting  problem,  which  we  can 
state  as,  "Answer  the  following  question:  Given  the  text  of  some  program  M  and  some 
input  v\ does  M  hall  on  input  wV'  So  there  can  exist  no  general  purpose  algorithm  that 
considers  an  arbitrary  program  and  determines  whether  or  not  it  halts  on  even  one 
input,  much  less  on  all  inputs.  However,  that  does  not  mean  that  there  are  not  particular 
programs  that  can  be  shown  to  halt  on  all  inputs. 

Any  program  that  has  no  loops  and  no  recursive  function  calls  halts  when  it  reaches 
the  end  of  its  code.  So  we  focus  our  attention  on  proving  that  loops  and  recursive  func¬ 
tions  halt.  In  a  nutshell,  any  such  proof  must  show  that  the  loop  or  the  recursion  exe¬ 
cutes  some  finite  number  of  steps.  Sometimes,  particularly  in  the  case  of  for  loops,  we 
can  simply  state  the  maximum  number  of  steps. 

EXAMPLE  A. 22  Termination  of  a  For  Loop 

Consider  the  following  very  simple  program  P: 

P(some  arguments)  = 

For  i  =  1  to  10  do: 

Compute  something. 

As  long  as  the  compute  step  of  P  does  not  modify  i,  we  can  safely  claim  that 
this  loop  executes  at  most  10  times.  (It  could  possibly  execute  fewer  if  it  exits 
prematurely.) 


When  dealing  with  while  and  until  kxips  and  with  recursive  functions,  it  may  not  be 
possible  to  make  such  a  straightforward  statement.  In  proving  that  any  such  program  P 
halts,  we  will  generally  rely  on  the  existence  of  some  well-founded  set  (5,/?)  such  that: 

•  There  exists  some  bijection  between  each  step  of  P  and  some  element  of  the  set  S , 

•  The  first  step  of  P  corresponds  to  a  maximal  (with  respect  to  R)  element  of  S , 

•  Each  successive  step  of  P  corresponds  to  a  smaller  (with  respect  to  /?)  element  of  S , 
and 

•  P  halts  on  or  before  it  executes  a  step  that  corresponds  to  a  minimal  (with  respect 
to  K)  element  of  S. 

EXAMPLE  A.23  Choosing  a  Well-Founded  Set 

Consider  the  following  simple  program  P  that  acts  on  a  finite-length  string: 

P(s:  string)  = 

While  length(s)  >  0  do: 

Remove  the  first  character  from  s  and  call  it  c. 
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EXAMPLE  A.23  ( Continued ) 

if  c  =  a  return  True. 

Return  False. 

Let  S  =  {(),  1, 2 . |.v| }.  (S,  <)  is  a  well-founded  set  whose  leusl  element  is  0. 

Associate  each  step  of  the  loop  with  |a|  as  the  step  is  about  to  be  executed.The  first 
pass  through  the  loop  is  associated  the  initial  length  of  s.  which  is  the  maximum 
value  of  M  throughout  the  computation.  |.v|  is  decremented  by  one  each  lime 
through  the  loop.  P  halts  when  |.v|  is  0  or  before  (if  it  finds  the  character  a).  So  the 
maximum  number  of  limes  the  loop  can  be  executed  is  the  initial  value  of  |v|. 


If  we  cannot  find  a  well-founded  set  that  corresponds  to  the  steps  of  a  loop  or  a  re¬ 
cursively  defined  function,  then  it  is  likely  that  that  program  fails  to  halt  on  at  least 
some  inputs. 


EXAMPLE  A.24  When  We  Can't  Find  a  Well-Founded  Set 

Consider  the  following  program  P,  along  with  the  claim  that,  given  some  positive 
integer  n.  P  always  halts  and  finds  and  prints  the  square  root  of  it: 

P{n:  positive  integer)  = 
r  -  0. 

Until  r*r  =  ii  do: 

r  =  r  +  1. 

Print  (r). 

We  could  try  to  prove  that  P  always  halts  by  using  the  well-founded  set  (N,  <). 
Associate  each  step  of  the  loop  with  n  -  r.  On  entrance  to  the  loop,  this  differ¬ 
ence  must  be  in  M  since  n  is  in  M  and  r  =  O.The  difference  decreases  at  each  step 
through  the  loop,  as  r  increases.  If  r  ever  equals  the  square  root  of  n.  the  difference 
will  be  0  and  the  loop  will  terminate.  But.  if  n  is  not  a  perfect  square,  there  is  no 
guarantee  that  the  difference  will  not  simply  become  more  and  more  negative.  So 
there  is  no  bijection  between  n  -  r 2  and  N.  There  is  one  between  n  -  r2  and  Z 
(the  integers),  but  Z  has  no  minimal  element  and  so  is  not  well-founded.  As  it  turns 
out,  there  is  no  well-founded  set  that  can  be  put  in  one-to-one  correspondence 
with  the  steps  of  this  loop,  which  cannot  be  guaranteed  to  halt. 


Proving  that  a  Program  Computes  the  Correct  Result 

Given  that  a  program  halts,  docs  it  halt  with  the  correct  result'.'  We  will  find  two  tech¬ 
niques  particularly  useful  for  proving  that  it  does. 

1.  Loop  invariants,  which  we  will  introduce  briefly  here. 

2.  Induction,  which  we  reviewed  in  Section  A.6.5. 
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Often  the  most  straightforward  way  to  analyze  any  sort  of  iterated  process  is  to 
focus  not  on  what  the  process  does  but  rather  on  what  it  doesn't  do.  So  we'll  describe 
some  key  properly  that  does  not  change  at  any  step  of  the  process's  execution. 


EXAMPLE  A.25  The  Coffee  Can  Problem 

Consider  the  following  problem,  which  we’ll  call  the  coffee  can  problem  [Gries 
1989):  We  have  a  coffee  can  that  contains  some  white  beans  and  some  black 
beans.  We  perform  the  following  operation  on  the  beans: 

Until  no  further  beans  can  be  removed  do: 

Randomly  choose  two  beans. 

If  the  two  beans  are  the  same  color,  then  throw  both  of  them  away  and 
add  a  new  black  bean. 

If  the  two  beans  are  different  colors,  then  throw  away  the  black  one  and 
return  the  while  one  to  the  can. 

It  is  easy  to  show  that  this  process  must  halt.  After  each  step,  the  number  of 
beans  in  the  can  decreases  by  one.  When  only  one  bean  remains,  no  further  beans 
can  be  removed. 

But  what  can  we  say  about  the  one  remaining  bean?  Is  it  white  or  black?  The 
answer  is  that  if  the  original  number  of  while  beans  is  odd,  the  remaining  bean  is 
white.  Otherwise  the  remaining  bean  is  black.  To  see  why  this  is  true,  we  note  that 
our  bean  culling  process  preserves  white  bean  parity.  In  other  words,  if  the  num¬ 
ber  of  while  beans  starts  out  even,  it  stays  even.  If  the  number  of  white  beans 
starts  out  odd.  it  stays  odd.  To  prove  that  this  is  so,  we  consider  each  action  that 
the  culling  process  can  perform.  There  are  three: 

•  Two  white  beans  are  removed  and  one  black  bean  is  added. 

•  Two  black  beans  are  removed  and  one  black  bean  is  added. 

•  One  black  bean  is  removed. 

In  each  of  these,  an  even  number  of  white  beans  is  removed  and  white  bean 
parity  is  preserved.  So.  if  the  number  of  white  beans  is  initially  odd,  the  number  of 
white  beans  can  never  become  zero  and  a  white  bean  must  be  the  sole  survivor.  If, 
on  the  other  hand,  the  number  of  white  beans  is  initially  even,  it  can  never  be¬ 
come  one.  Thus  any  sole  survivor  must  be  black. 


The  white  bean  parity  property  that  we  just  described  is  an  example  of  a  loop  in¬ 
variant:  a  predicate  /  that  describes  a  property  that  doesn't  change  during  the  execu¬ 
tion  ol  an  iterative  process.  To  use  a  loop  invariant  /  to  prove  the  correctness  of  a 
program,  we  must  prove  each  of  the  following. 

•  /is  true  on  entry  to  the  loop. 

•  The  truth  or  /  is  maintained  at  each  pass  through  the  loop.  By  this  we  mean  that,  if 
/  is  true  at  the  beginning  of  a  particular  pass  through  the  loop,  then  it  must  also  be 
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u  lie  at  i he  end  of  ihai  pass.  Note,  however,  that  /  may  fail  to  hold  at  some  point 
partway  through  the  loop. 

•  /.  together  with  the  loop  termination  condition,  imply  whatever  property  we  wish 
to  prove  is  true  on  exit  from  the  loop. 


EXAMPLE  A.26  Finding  a  Loop  Invariant 

Consider  the  following  program  P: 

P(s:  string)  = 

count  =  0. 

For  i  =  1  to  lenglh(s)  do: 

If  the  /,h  character  of  s  is  u  then: 
count  =  count  +  1 . 

Print  {count). 

Prove  that  the  value  of  count .  on  exit  from  the  loop,  is  the  number  of  a’s  in  s. 
Call  this  claim  C.  We  will  use  a  loop  invariant  to  prove  C. 

We'll  use  the  notation  #a(x)  to  mean  the  number  of  a's  in  the  string  s.  Let: 

1  =  [#a(v)  =  count  +  #a(the  last  {lengtli{s)  +  1  —  i)  characters  of  $)]. 

In  other  words,  the  total  number  of  a's  in  s  is  equal  to  the  current  value  of  count 
plus  the  number  of  a’s  in  that  part  of  .v  that  has  not  so  far  been  examined  by  the  loop. 
Wc  show; 

•  /is  true  on  entry  to  the  loop: i  =  1  and  count  -  0.  So  we  have: 

#a(s)  =  0  +  #a  (the  last  {length(s))  characters  of  a),  which  is  true. 

•  /is  maintained  at  each  step  through  the  loop:  If  the  i"*  character  of  s  is  an  a. 
then  count  is  incremented  by  1.  But  i  is  also  incremented,  so  the  number  of  a’s 
in  the  last  (lengrh{s)  +  1  —  /)  characters  of  s  is  decremented  by  1.  leaving 
count  +  #a(the  last  (length(s)  +  1  -  i)  characters  of  v)  unchanged.  If  the  /th 
character  of  .v  is  not  an  a,  then  the  value  of  both  count  and  the  number  of  a’s  in 
the  last  (length{s)  +  1  -  /')  remains  unchanged. 

•  /,  together  with  the  loop  termination  condition,  imply  C:  On  exit  from  the 
loop,  i  =  length{s)  +  1.  So  we  have: 

/  A  [i  =  length(s)  +■  1)  *  (#a(A)  =  +  #.4(t.hc  Iasi  (lcngtli(s)  +  1  —  /) 

characters  of  .v) |  A  (/'  =  lengtli(s)  +  1J. 
s  (#a(-v)  =  count  +  #„(the  last  (length(s)  +  l  — 
(lengtli{s)  +  1))  characters  of  a)]. 
s  [#a(*)  =  count  +  #„  (the  last  0  characters  of  a)]. 

=  #a(A)  =  count. 

So,  on  exit  from  the  loop,  count  is  equal  to  the  number  of  a’s  in  .v.Thus  C  is  true. 
Note  that  a  separate  proof  is  required  to  show  that  the  loop  does  in  fact  terminate. 
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EXAMPLE  A.27  Finding  a  Loop  Invariant  for  a  Program  that  Doesn't 
Halt 

Consider  the  following  program  P,  which  differs  from  our  other  examples  since  it 
is  not  designed  to  halt: 


P{)  = 

5= 

Loop: 

Print(i). 
s  =  s  ||  a. 

Prove  that  P  will  print  all  and  only  the  finite  length  strings  composed  of  0  or 
more  a’s  (and  no  other  characters).  We  will  use  a  loop  invariant  to  prove  that  P 
prints  only  strings  composed  exclusively  of  as.  We  will  use  induction  to  prove  that 
P  will  eventually  print  every  string  composed  only  of  a’s.  The  loop  invariant  we 
need  is  /  =  [s  contains  no  characters  other  than  a]. 

We  show: 

•  /is  true  on  entry  to  the  loop  the  first  time:  s  is  the  empty  string  and  so  contains 
no  characters  that  are  not  a. 

•  /is  maintained  at  each  step  through  the  loop:  s  is  unchanged  through  the  loop 
except  to  have  a  single  a  added  to  the  end  of  it.  So  if  it  contained  only  a’s  at 
the  top  of  the  loop,  it  will  contain  only  a’s  at  the  bottom. 

•  We  are  not  concerned  with  what  happens  when  the  loop  in  P  terminates,  since 
it  doesn't.  So  we  can  skip  the  step  in  which  we  show  that  some  statement  is 
true  on  exit  from  the  loop. 

Since  /  must  be  true  at  the  top  of  the  loop,  it  is  true  when  the  print  statement 
executes,  so  only  strings  composed  exclusively  of  a’s  will  be  printed. 

Now  we  need  to  show  that  P  will  eventually  print  any  strings  that  is  composed 
of  no  characters  other  than  a.  We  do  this  by  induction  on  |s|: 

•  Base  step:  Let  |s|  =  0.  P  prints  s  the  first  time  through  the  loop. 

•  Induction  hypothesis:  P  prints  all  strings  of  a’s  of  length  n.  Note  that,  for  any 
value  of «,  there  is  only  one  such  string.  Call  it  a". 

•  Prove  that  P  prints  all  strings  of  a’s  of  length  n  +  1.  There  is  only  one  such 
string,  namely  a"a.  By  the  induction  hypothesis.  P  generates  a".  When  it  does 
that,  the  variable  s  is  equal  to  a'1.  The  next  thing  P  does  is  to  concatenate  one 
more  a  onto  s.  which  then  equals  a"a,  and  print  it. 

So.  for  all  N^O.P  prints  the  string  composed  exactly  of  n  a’s. 
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A.7.2  Analyzing  Complexity 

Whenever  we  present  a  program  P.  we  may  want  to  ask  the  question.  “How  long  will  it 
take  P  to  run?"  or. “How  much  memory  will  P  use?"  Generally  the  answer  depends  on 
the  size  of  the  input.  So  our  answer  will  usually  be  staled  as  a  function  of  some  number 
that  corresponds  to  a  reasonable  measure  of  the  size  of  the  input.  If  the  input  is  a 
string,  we  can  use  the  length  of  the  string.  If  the  input  is  a  structure  like  a  list  or  an 
array,  we  can  use  the  number  of  elements  in  the  structure.  If  the  input  is  a  number,  we 
will  typically  use  the  length  of  the  binary  or  decimal  encoding  of  the  number. 

In  Part  V  of  this  book  we  present  a  formal  theory  of  both  lime  and  space  complexity. 
Here  we  present  an  informal  treatment  of  the  approach  to  time  complexity  that  we  will 
describe  there. 

We  will  describe  the  lime  complexity  of  a  program  P  as  a  function  of  the  size  of  its 
input,  which  we'll  call  n.  We  are  typically  not  interested  in  how  long  it  lakes  P  to  run  on 
small  inputs.  Rather  we  are  concerned  with  how  quickly  execution  time  grows  as  n 
grows.  While  in  some  eases  we  are  concerned  with  an  exact  count  of  the  number  of 
steps  that  P  must  execute,  we  are  often  willing  to  ignore  constant  factors  and  instead  to 
concentrate  on  whether  P's  execution  time 

•  is  constant  (i.c..  it  is  independent  of  n). 

•  grows  linearly  with  n, 

•  grows  faster  than  n  but  at  a  rate  that  can  be  described  by  some  polynomial  function 
of  n  (for  example,  tr).  or 

•  grows  at  a  rale  that  is  faster  than  any  polynomial  function  of  n  (for  example  2"). 

Suppose  that  we  have  a  program  that,  on  input  of  length  n.  executes  n*  +  2/i  +  3 
steps.  As  n  increases,  the  ny  term  dominates  the  other  two.  So  we  would  like  to  ignore  the 
slower  growing  terms  of  the  function  if  +  2 n  +  3  and  characterize  the  time  required  to 
execute  this  program  as  the  simpler  function  n\  To  do  that,  we  introduce  the  notion  of 
asymptotic  dominance  of  one  function  by  another. 

Let  f(n)  and  g(n)  be  functions  from  the  natural  numbers  to  the  positive  reals. Then 
we  ll  say  that  the  function  g(n)  asymptotically  dominates  the  function  }{n)  iff  there 
exists  a  positive  integer  k  and  a  positive  constant  c  such  that: 

V/i  ^  k  (/(/»)  s  c* #(>»))• 

In  other  words,  ignoring  some  number  of  small  cases  (all  those  of  size  less  than  k ), 
and  ignoring  some  constant  factor  c.f(n)  is  hounded  from  above  by  g(n). 

We  will  use  the  symbol  C9  to  denote  the  asymptotic  dominance  relation,  so  0(g(n)) 
is  the  set  of  all  functions  that  are  asymptotically  dominated  by  /?(//).  Thus,  if  g(/j) 
asymptotically  dominates /(a),  we  will  write: 

f(n)eO(n(n)). 

This  claim  is  read. “/is  big-O  of  g".  It  is  also  often  written  f(n)  -  0{g(n)).  although 
that  statement  is  not  literally  correct  since  0(g(n))  is  a  set  of  functions,  not  a  function 
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EXAMPLE  A.28  O 

n*  +  2n  +  3  e  0(/t3).  since  we  can  let  k  =  2  and  c  =  2  and  observe  that  for  all 
m  >  2,  /i3  +  2/i  +  3  <  2w\ 


Now  we  can  return  to  the  problem  of  characterizing  the  execution  time  of  a  pro¬ 
gram  P.  Let  j\n)  be  a  function  that  describes  the  time  required  to  execute  P  as  a  func¬ 
tion  of  //.  where  n  is  some  reasonable  measure  of  the  size  of  P's  input.  We'll  say  that  P 
runs  in  lime  0(g(n))  iff  /(«)  e  0(g(n)). 


EXAMPLE  A.29  Using  O  to  Measure  Time  Complexity: 

A  Linear  Example 

Consider  again  the  program  P  from  Example  A.26: 

P(s :  string)  = 

count  =  0. 

For  i  =  1  to  length(s)  do: 

If  the  /lh  character  of  s  is  a  then: 
count  =  count  +  1. 

Print  (count). 

Let  n  =  length(s).  The  number  of  program  steps  that  P  executes  is  at  most 
2  +  2 ne  C7(/i).  So  the  execution  time  of  P  grows  linearly  in  the  length  of  its  input. 


EXAMPLE  A.30  Using  O  to  Measure  Time  Complexity: 

A  Quadratic  Example 

Consider  the  following  program  P ,  which  returns  True  if  any  two  elements  of  its 
input  vector  are  the  same  and  False  otherwise: 

P(v:  vector  of  integers)  = 

For  /  =  1  to  length(ii)  do: 

For  j  =  /  +  1  to  length(v)  do: 

If  v[i\  =  v\j\  then  return  True. 

Return  False. 

Let  n  =  length(u).  In  the  worst  case.  P  goes  through  the  outer  loop  n  times. 
At  each  pass,  unless  it  finds  a  match,  it  goes  through  the  inner  loop  on  average 
nil  times  So  the  number  of  program  steps  that  P  executes  is  at  most 

1  +  /'(l  +  2/i/2)  =  1  +  n  +  rr  e  C?(/r).  So  the  execution  time  of  P  grows  as  the 
square  of  the  length  of  its  input. 
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Suppose  that  a  program  P.  on  inpul  of  size  n.  runs  in  time  /(//)  =  2  +  4/r.  Then 
fOi)eO(n).  But  nolice  that  it  is  also  true  that  f{n)eO{ir)  and  f[n)eC(2"),  since 
both  ir  and  2"  also  asymptotically  dominate  2  +  4h.  In  Chapter  27  we  will  define  Q.  a 
relation  that  is  similar  to  O  except  that  it  is  stricter.  Specifically. 

/(«)  e  «(£(//))  iff /(«)  e  Oi!t(n))  and  , (>(/«)  e  0(  f(n)). 

So  2  +  4//  e  W(n).  but  2  +  4/?*  Of/?3)  because  ir  «  (?(/») 

Discussions  of  the  complexity  of  algorithms  should  use  O.  whenever  possible,  since 
we  want  the  lightest  bound  we  can  find.  But  that  is  not  the  convention.  As  we  did  in 
hoth  Example  A.2y  and  Example  A.30.  we  will  use  the  standard  convention  of  writing 
/(/?)  e  0(};{n))  instead  of  /(/?)  e  H(#(m)).  but.  whenever  we  can.  we  will  choose  values 
for  #(//)  such  that  the  claim  thal  /(h)  e  H(#(n))  would  also  be  true. 

In  analyzing  the  algorithms  that  we  will  consider  in  Parts  II  through  IV  of  this  book, 
we  will  use  the  O  relation  as  we  have  just  defined  it.  In  Chapter  27.  we  will  have  more 
to  say  about  O  and  similar  relations  such  as  O. 


A.8  A  General  Definition  of  Closure  * 

In  Section  A.5  we  introduced  closures.  We  elaborate  on  that  discussion  here.  We  begin 
by  reviewing  what  we  said  there.  Imagine  some  set  S  and  some  property  P.  If  we  care 
about  making  sure  thal  S  has  properly  P.  we  could  do  the  following. 

1.  Examine  5  for  P.  If  it  has  properly  P.  we  quit. 

2.  If  it  doesn't,  then  add  to  5  the  smallest  number  of  additional  elements  required  to 
satisfy  P. 

We  will  say  that  S  is  closed  with  respect  to  P  iff  it  possesses  P.  And.  if  we  have  to 
add  elements  to  S  in  order  to  satisfy  P.  we'll  call  a  smallest  such  expanded  S  that  does 
satisfy  P  a  closure  of  S  with  respect  to  P. 


EXAMPLE  A.31  Some  Relations  and  Their  Closures 

1.  Let  S  be  a  set  of  friends  we  are  planning  to  invite  to  a  party.  Let  P  be,“S  should 
include  everyone  who  is  likely  to  find  out  about  the  party”  (since  we  don’t 
want  to  offend  anyone).  Let's  assume  that  if  you  invite  Bill  and  Bill  has  a 
friend  Bob,  then  Bill  may  tell  Bob  about  the  party. 'litis  means  that  if  you  want 
S  to  satisfy  P.  then  you  have  to  invite  not  only  your  friends,  but  your  friends’ 
friends,  and  their  friends,  and  so  forth.  If  you  move  in  a  fairly  closed  circle,  you 
may  be  able  to  satisfy  P  by  adding  a  few  people  to  the  guest  list.  On  the  other 
hand,  it’s  possible  that  you'd  have  to  invite  the  whole  city  before  P  would  be 
satisfied.  It  depends  on  the  connectivity  of  the  Frimthtf  relation  in  your  social 
setting.  The  problem  is  that  whenever  you  add  a  new  person  to  .9,  you  have  to 
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lum  around  and  look  at  that  person’s  friends  and  consider  whether  there  are 
any  of  them  who  are  not  already  in  S.  If  there  are,  they  must  be  added,  and  so 
forth.  There  is  one  positive  feature  of  this  problem,  however.  Notice  that 
there  is  a  unique  set  that  does  satisfy  P.  given  the  initial  set  S. There  aren’t  any 
choices  to  be  made. 

2.  Let  S  be  a  set  of  six  people.  Let  P  be,“S  can  enter  a  baseball  tournament.” 
litis  problem  is  different  from  the  previous  one  in  two  important  ways. 
First,  there  is  a  clear  limit  on  how  many  elements  we  have  to  add  to  S  in 
order  to  satisfy  P.  We  need  nine  people  and  when  we’ve  got  them  we  can 
slop.  Hut  notice  that  there  is  not  a  unique  way  to  satisfy  P  (assuming  that 
wc  know  more  than  nine  people).  Any  way  of  adding  three  people  to  S 
will  work. 

3.  Let  S  be  the  Address  relation  (which  we  defined  earlier  as  “lives  at  same 
address  as”).  Since  relations  are  sets,  we  should  be  able  to  treat  Address  just 
as  we’ve  treated  the  sets  of  people  in  our  last  two  examples.  We  know  that 
Address  is  an  equivalence  relation.  So  well  let  P  be  the  property  of  being  an 
equivalence  relation  (i.e.,  reflexive,  symmetric,  and  transitive).  But  suppose 
we  are  only  able  to  collect  facts  about  living  arrangements  in  a  piecemeal 
fashion.  For  example,  we  may  learn  that  Address  contains  (Dave,  Stacy), 
(Jen,  Pete),  (John,  Bill).  Immediately  we  know,  because  Address  must  be  re¬ 
flexive,  that  it  must  also  contain  (Dave,  Dave),  (Stacy.  Stacy),  (Jen.  Jen), 
(Pete.  Pete),  (John.  John),  (Bill.  Bill).  And,  since  Address  must  also  be  sym¬ 
metric,  it  must  contain  (Stacy,  Dave),  (Pete.  Jen),  (Bill,  John).  Now  suppose 
that  we  discover  that  Stacy  lives  with  Jen.  We  add  (Stacy.  Jen).  To  make 
Address  symmetric  again,  we  must  add  (Jen,  Stacy).  But  now  we  also  have  to 
make  it  transitive  by  adding  (Dave,  Jen),  (Jen,  Dave). 

4.  Let  S  be  the  set  of  positive  integers.  Let  P  be, “The  sum  of  any  two  elements 
of  S  is  also  in  S.”  Now  we’ve  got  a  property  that  is  already  satisfied. The  sum 
of  any  two  positive  integers  is  a  positive  integer.  This  time,  we  don’t  have  to 
add  anything  to  S  to  establish  P. 

5.  Let  S  again  be  the  set  of  positive  integers.  Let  P  be,  “The  quotient  of  any 
two  elements  ot  S  is  also  in  S."  This  time  we  have  a  problem.  3/5  is  not  a 
positive  integer.  We  can  add  elements  to  S  to  satisfy  P.  If  we  do,  we  end  up 
with  exactly  the  positive  rational  numbers. 


Io  use  closures  effectively,  we  need  to  define  precisely  what  we  mean  when  we  say 
that  a  set  S  is  closed  under  P  or  that  the  closure  of  S  under  P  is  7'.  We  present  here  a  set 
of  definitions  that  include  all  hut  one  of  the  specific  cases  that  we  just  described.  The 
definitions  ol  closure  that  we  presented  in  Section  A.5  are  special  cases  of  the  ones 
presented  here,  le  one  requirement  that  must  be  met  in  order  to  apply  these  defini¬ 
tions  to  a  closure  problem  is  that  we  must  be  able  to  describe  the  property  P  that  is  to 
be  maintained  as  a  relation. 
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Let  n  he  an  integer  greater  than  or  equal  to  I .  Let  R  be  an  n-ary  relation  on  a  set  A. 

Thus  elements  of  R  are  of  the  form  (r/,.  d2 . il„).  We  say  that  a  subset  5  of  A  is  closed 

under  R  iff.  whenever: 

•  d\.di...  du-\  eS  (all  of  the  first  n  —  1  elements  are  already  in  the  set  5), and 

•  ({/',</, _ tl„.\,d„)e  R  (the  last  element  is  related  to  the  n  -  I  other  elements  via/?). 

it  is  also  true  that  d„  e  S. 

A  set  5’  is  a  closure  of  S  with  respect  to  R  (defined  on  A )  iff: 

•  SCS'. 

•  S'  is  closed  under  R,  and 

•  V7'((S C  Tand  T  is  closed  under  R)  — *  |5'|  s  \T\). 

In  other  words.  S'  is  a  closure  of  5  with  respect  to  R  if  it  is  an  extension  (i.e., a  super¬ 
set)  of  S  that  is  closed  under  R  and  if  there  is  no  smaller  set  that  also  meets  both  of 
those  requirements.  Note  that  we  cannot  say  that  S'  must  be  the  smallest  set  that  will 
do  the  job.  since  we  do  not  yet  have  any  guarantee  that  there  is  a  unique  such  smallest 
set  (recall  the  softball  example  above). 

These  definitions  of  closure  are  a  very  natural  way  to  describe  our  first  example  above. 
Drawing  from  a  set  A  of  people,  you  start  with  S  equal  to  your  friends. Then,  to  compute 
your  invitee  list  S',  you  simply  take  the  closure  of  S  with  respect  to  the  relation  Friendof, 
which  will  force  you  to  add  to  S'  your  friends'  friends,  their  friends,  and  so  forth. 

These  definitions  also  apply  naturally  to  our  fifth  example,  the  positive  integers 
under  division.  The  smallest  set  that  contains  the  positive  integers  and  that  is  closed 
under  division  is  the  positive  rational*.  So  the  closure  under  division  of  the  positive  in¬ 
tegers  is  the  positive  rationals. 

Now  consider  our  second  example,  the  case  of  the  baseball  team.  Here  there  is  no 
relation  R  that  specifies,  if  one  or  more  people  are  already  on  the  team,  that  some  spe¬ 
cific  other  person  must  also  be  on.  The  properly  we  care  about  is  a  property  of  the 
team  (set)  as  a  whole  and  not  a  property  of  patterns  of  individuals  (elements).  Thus 
this  example,  although  similar,  is  not  formally  an  instance  of  closure  as  we  have  just  de¬ 
fined  it. This  turns  out  to  be  significant  and  leads  us  to  the  following  definition: 

Any  properly  that  asserts  that  a  set  .S'  is  closed  under  some  relation  R  is  called  a 
closure  property  of  S. 


THEOREM  A.5  Closures  Exist  and  are  Unique 

Theorem:  If  R  is  a  closure  properly,  as  just  defined,  on  a  set  A  and  .S'  is  a  subset  of 
A.  then  the  closure  of  S  with  respect  to  R  exists  and  is  unique. 

Proof:  Omitted. 

Stating  the  theorem  another  way.  if  its  conditions  are  met  then  there  exists  a  unique 
minimal  set  S'  that  contains  S  and  is  closed  under  R.  Of  all  of  our  examples  above,  the 
baseball  example  is  the  only  one  that  cannot  be  described  in  the  terms  of  this  definition 
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of  a  closure  property. The  theorem  that  we  have  just  slated  (without  proof)  guarantees, 
therefore,  that  it  will  be  the  only  one  that  does  not  have  a  unique  minimal  solution. 

’ITic  definitions  that  we  have  just  provided  also  work  to  describe  our  third  example, 
in  which  we  want  to  compute  the  closure  of  a  relation  (since,  after  all,  a  relation  is  a 
set).  All  we  have  to  do  is  to  come  up  with  relations  that  describe  the  properties  of  being 
reflexive,  symmetric,  and  transitive. To  help  us  see  what  those  relations  need  to  be,  let’s 
recall  our  definitions  of  symmetry,  reflexivity.  and  transitivity: 

•  A  binary  relation  RCA  X  A  is  reflexive  iff,  for  each  a  e  A,  ( a ,  a)  e  R. 

•  A  binary  relation  RCA  X  A  is  symmetric  iff,  whenever  (a,  b)  e  R ,  so  is  (h,  a). 

•  A  binary  relation  RCA  X  A  is  transitive  iff,  whenever  (n,  h)e  R  and  (b,c)e  R, 
(a,  t  )  e  R. 

Looking  at  these  definitions,  we  can  come  up  with  three  relations,  Reflexivity, 
Symmetry,  and  Transitivity.  All  three  are  relations  on  relations,  and  they  will  enable  us 
to  define  these  three  properties  using  the  closure  definitions  we’ve  given  so  far.  All 
three  definitions  assume  a  base  set  A  on  which  the  relation  that  we  are  interested  in  is 
defined: 


For  any  a  in  A,  ((a,  a))  e  Reflexivity  and  no  other  elements  are.  Notice  the  double 
parentheses  here.  Reflexivity  is  a  unary  relation,  where  each  element  is  itself  an  or¬ 
dered  pair.  It  doesn't  really  “relate”  two  elements.  It  is  simply  a  list  of  ordered  pairs. 
To  see  how  it  works  to  define  reflexive  closure. imagine  a  set  A  =  {a,  y).  Now  sup¬ 
pose  we  start  with  a  relation  R  on  A  =  {(a,  y)}.  Clearly  R  isn’t  reflexive:  The 
Reflexivity  relation  on  A  is  {((.r,.v)),  ((y,  v))}.  Reflexivity  is  a  unary  relation.  So  n, 
in  the  definition  of  closure,  is  1.  Consider  the  first  element  (( x ,  a)).  We  consider  all 
the  components  before  the  /t,h  (i.e.,  first)  and  see  if  they  are  in  A.  This  means  we 
consider  the  first  zero  components.  Trivially,  all  zero  of  them  are  in  A.  So  the  /ilh 
(the  first)  must  also  be. This  means  that  (.v,  a)  must  be  in  R.  But  it  isn’t.  So  to  com¬ 
pute  the  closure  of  R  under  Reflexivity,  we  add  it.  Similarly  for  (v,  y). 

For  any  a  and  h  in  A.  a  *  b—*  [((a,  b),  (b, a))  e  Symmetry]  and  no  other  elements 
are.  This  one  is  a  lot  easier.  Again,  suppose  we  start  with  a  set  A  =  {a,  y]  and 
a  relation  R  on  A  =  {(a,  y)}.  Clearly  R  isn't  symmetric:  Symmetry  on 
A  =  {((-V.  >’),  (y,  a)),  ((y,  a),  (a,  y))}.  But  look  at  the  first  element  of  Symmetry.  It 
tells  us  that  for  R  to  be  closed  under  Symmetry,  whenever  (a,  y)  is  in  R.  (y,  a)  must 
also  be.  But  it  isn  t.  To  compute  the  closure  of  R  under  Symmetry,  we  must  add  it. 


For  any  a.h  and  c  and  in  A,  [a  *  b  A  h  *  c]  -*  [((«,  b),  ( b ,  c),  (a.  c))  e  Transitivity] 
and  no  other  elements  are.  Now  we  will  exploit  a  ternary  relation.  Whenever  the 
first  two  elements  of  it  are  present  in  some  relation  R,  then  the  third  must  also 
be  il  R  is  transitive.  This  time,  let's  start  with  a  set  A  =  (a,  v,  z)  and  a  relation  R 
on  A  =  {(a,  y),  (y,  z)}.  Clearly  R  is  not  transitive: The  Transitivity  relation  on  A 
is  {  ((a,  y),  (y, z),  (a ,  z)).  ((a,  z).  (z,  y),  (a,  y)),  (( y,  a),  (a,  z),  (y,  z)),  ((y.  z),  (z,  x), 
(y.  A)).  ((z,  a),  (a,  y),  (z.  y)),  ((z,  y),  (y,  a),  (z,  a))}.  Look  at  the  first  element  of  it. 
Both  of  the  first  two  components  of  it  are  in  R.  But  the  third  isn't.  To  make  R 
transitive,  we  must  add  it. 
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We  can  also  describe  the  closure  of  the  positive  integers  under  division  with  a 
closure  property:  Let  A  be  the  positive  rational*.  let  S  be  the  positive  integers  and 
let  R  be  Qnoiieniclosure.  defined  as: 

•  For  any  a.  b  and  c  and  in  A.  [alb  =  t]  — *  ((</.  />.  r)  e  Quotient  closure]. 

So  there  exists  a  unique  closure  of  S  with  respect  to  Qumientdosure.  In  this  case, 
that  closure  is  A. 

We  now  have  a  general  definition  of  closure  that  makes  it  possible  to  prove  the  exis¬ 
tence  of  a  unique  closure  for  any  set  and  any  relation  R. The  only  constraint  is  that  this 
definition  works  only  if  we  can  define  the  property  we  care  about  as  an  n-ary  relation 
for  some  finite  n. There  are  cases  of  closure  where  this  is  not  possible,  as  we  saw  above 
in  the  baseball  team  example,  but  we  will  not  consider  them  further. 


Exercises 

1.  Prove  each  of  the  following: 

a.  ((/I  A  B)  —>C)+~*  (-i/4  V  ~\B  V  C). 

b.  (A  A  -,B  A  -C)  —  (A  V  MR  AC)). 

2.  List  the  elements  of  each  of  the  following  sets: 

a.  ,^(  { apple,  pear,  banana } ). 

b.  .^({a.b})  -  ;£({a.c}). 

c.  :^(0). 

d.  {a. b}  X  {1,2,3}  x  0. 

e.  {.re  ( x  <7A.r>7)}, 

f.  {.v  e  By  e  (y  <  10  A  (y  ■+  2  =  .v ) ) }  (where  Pi  is  the  set  of  nonnegalive 
integers). 

g.  {xeN:  By  e  N  (B:eM  ((.r  =  y  +  z)  A  (y  <  5)  A  (z  <  4)))}. 

3.  Prove  each  of  the  following: 

a.  AU(BncnD)  =  (/tufl)n(/iu7))n(/iuq. 

b.  /tU(fincrM)  =  a. 

c.  (BHC)  -  ACC. 

4.  Consider  the  English  sentence.  "If  some  bakery  sells  stale  bread  and  some  hotel 
sells  flat  soda,  then  the  only  thing  everyone  likes  is  tea  "  This  sentence  has  at 
least  two  meanings.  Write  two  (logically  different)  first-order  logic  sentences  that 
correspond  to  meanings  that  could  be  assigned  to  this  sentence.  Use  the  follow¬ 
ing  predicates:  P(x)  is  True  iff  .v  is  a  person;  B(x)  is  True  iff  jr  is  a  bakery;  Sti(x)  is 
True  iff  .v  sells  stale  bread;  //(.t)  is  True  iff  a  is  a  hotel:  Ss(x)  is  True  iff  a  sells  flat 
soda:  L(.v.y)  is  True  iff  a  likes  y;  and  T(x)  is  True  iff  a  is  tea. 

5.  Let  P  be  the  set  of  positive  integers.  Let  L  -  { A.  B . Z }  (i.e„  the  set  of  upper 

case  characters  in  the  English  alphabet).  Let  '/'be  the  set  of  strings  of  one  or  more 
uppercase  English  characters.  Define  the  following  predicates  over  those  sets, 

•  For  x  e  /.,  V(x)  is  True  iff  a  is  a  vowel.  (The  vowels  are  A.  E,  1, 0,  and  U.) 

•  For  a  e  L  and  n  e  P.  S(x.  n)  is  True  iff  a  can  be  written  in  n  strokes. 
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•  For  x  e  L  and  s  e  T,  0(a.  s)  is  True  iff  x  occurs  in  the  string  s. 

•  For  a,  v  e  /.,  B(x.y)  is  True  iff  .v  occurs  before  y  in  the  English  alphabet. 

•  For  x,  y  e  L,  £(.v,  v)  is  True  iff  x  =  y. 

Using  these  predicates,  write  each  of  the  following  statements  as  a  sentence  in 
first -order  logic: 

a.  A  is  the  only  upper  case  English  character  that  is  a  vowel  and  that  can  be  writ¬ 
ten  in  three  strokes  but  does  not  occur  in  the  string  STUPID. 

I).  There  is  an  upper  case  English  character  strictly  between  K  and  R  that  can  be 
written  in  one  stroke. 

6.  Choose  a  set  A  and  predicate  T  and  then  express  the  set  { 1, 4, 9, 16, 25, 36. ... } 
in  the  form: 

{a e  A  :  P(x)}. 

7.  Find  a  set  that  has  a  subset  but  no  proper  subset. 

8.  Give  an  example,  other  than  one  of  the  ones  in  the  book,  of  a  reflexive,  symmetric, 
intransitive  relation  on  the  set  of  people. 

9.  Not  equal  (defined  on  the  integers)  is  (circle  all  that  apply):  reflexive, symmetric, 
transitive. 

10.  In  Section  A.3.3.  we  showed  a  table  that  listed  the  eight  possible  combinations  of 
the  three  properties:  reflexive,  symmetric  and  transitive.  Add  antisymmetry  to  the 
table.  There  are  now  16  possible  combinations.  Which  combinations  could  some 
nontrivial  binary  relation  possess?  Justify  your  answer  with  examples  to  show  the 
combinations  that  are  possible  and  proofs  of  the  impossibility  of  the  others. 

11.  Using  the  definition  of  =p  (equivalence  modulo  p)  that  is  given  in  Example  A.4, 
lei  Rt,  be  a  binary  relation  on  foJ,  defined  as  follows,  for  any  p  2:  1: 

R,,  =  {(a.  b):  a  =  ,b}. 

So.  for  example  Ry  contains  (0, 0),  (6. 9),  (1.4),  etc.,  but  does  not  contain  (0. 1). 
(3, 4),  etc. 

a.  Is  Rp  an  equivalence  relation  for  every  pal?  Prove  your  answer. 

b.  11  Rp  is  an  equivalence  relation,  how  many  equivalence  classes  does  it  induce 

for  a  given  value  of  p?  What  are  they?  (Any  concise  description  is  fine.) 

c.  Is  Rr  a  partial  order?  A  total  order?  Prove  your  answer. 

12.  Let  S  =  {me  {a,  b}*}.  Define  the  relation  Substr  on  the  set  5  to  be  {(s, /)  :s  is  a 
substring  of  f). 

a.  (  hoose  a  small  subset  of  Suhstr  and  draw  it  as  a  graph  (in  the  same  way  that 
we  drew  the  graph  of  Example  A.5). 

b.  Is  Subsir  a  partial  order? 

13.  Let  P  be  the  set  of  people.  Define  the  function: 

futlier-of:  P-*  p. 

fatlier-of(x)  =  the  person  who  is  a’s  father 

a.  Is  fmher-of  one-to-one? 

b.  Is  it  onto? 
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14.  Are  the  following  sets  closed  under  the  following  operations?  If  not,  give  an 
example  that  proves  that  they  arc  not  and  then  specify  what  the  closure  is. 

a.  The  negative  integers  under  subtraction. 

b.  The  negative  integers  under  division. 

c.  The  positive  integers  under  exponentiation. 

d.  The  finite  sets  under  Cartesian  product. 

e.  The  odd  integers  under  remainder,  mod  3. 

f.  The  rational  numbers  under  addilion. 


15.  Give  examples  to  show  that: 

a.  The  intersection  of  two  countably  infinite  sets  can  be  finite. 

b.  The  intersection  of  two  countably  infinite  sets  can  be  countably  infinite. 

c.  The  intersection  of  two  uncountable  sets  can  be  finite. 

d.  The  intersection  of  two  uncountable  sets  can  be  countably  infinite. 

e.  The  intersection  of  two  uncountable  sets  can  be  uncountable. 


16.  Let  R  -  {(1,2).  (2. 3).  (3, 5),  (5, 7).  (7. 1 1 ),  ( 1 1, 13).  (4. 6).  (6.  X),  (X.  9).  (9, 10). 
(10, 12)}.  Draw  a  directed  graph  representing  R*.  the  reflexive,  transitive  closure  of  R. 

17.  Let  N  be  the  set  of  nonnegative  integers.  For  each  of  the  following  sentences  in 
first-order  logic,  stale  whether  the  sentence  is  valid,  is  not  valid  but  is  satisfiable, 
or  is  unsalisfiablc.  Assume  the  standard  interpretation  for  <  and  >.  Assume 
that  /could  be  any  function  on  the  integers.  Prove  your  answer. 

a.  Va-  e  ( 3y  f  N  (y  <  .v) ) 

b.  V.v  e  ( 3y  e  N  (y  >  or)) 

c.  V.r  e  M  ( 3y  e  f(x)  -  y) 


18.  Let  N  be  the  set  of  nonnegativc  integers.  Let  A  be  the  set  of  nonnegative  integers 
.vsuch  that  .v  *30.  Show  that  |foj|  =  |/l|. 

19.  What  is  the  cardinality  of  each  of  the  following  sets?  Prove  your  answer. 

a.  :  n  =  *0} 

b.  {»i  elM  :  /i  *3!)}  n{«eN:«  is  prime}. 

c.  (n  e  :  n  * }  0}  U  {/i  e  :  n  is  prime ) 

20.  Prove  that  the  set  of  rational  numbers  is  countably  infinite. 

21.  Use  induction  to  prove  each  of  the  following  claims: 


a. 


Vm  >  (I 


n(n  +  1  )(2n  +  J)\ 

~6  I 


b.  V«  >()(«!  a  2  ).  Recall  that  0!  =  I  and  V/i>0(m!=m(m-  I  >(//  —  2) —  1 ). 

c.  Va  >  0^2*  *  2"*‘  ~  I  j- 

C"  _  i  \ 

5X  =  ~  ~  ~t  )■  given  r  *  0,1. 

e.  V/isl)|5jf}*/,,</lH|j.  where  J„  is  the  element  of  the  Fibonacci 
sequence,  as  defined  in  Example  24.4. 
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22.  Consider  a  finite  rectangle  in  the  plane.  We  will  draw  some  number  of  (infinite) 
lines  that  cut  through  the  rectangle.  So.  for  example,  we  might  have: 


In  Section  28.7.6,  we  define  what  we  mean  when  we  say  that  a  map  can  be  col¬ 
ored  using  two  colors. Treat  the  rectangle  that  we  just  drew  as  a  map,  with  regions 
defined  by  the  lines  that  cut  through  it.  Use  induction  to  prove  that,  no  matter 
how  many  lines  we  draw,  the  rectangle  can  be  colored  using  two  colors. 

23.  Let  (liviin)  =  [ntl\  (i.e..  the  largest  integer  that  is  less  than  or  equal  to  nil).  Al¬ 
ternatively.  think  of  it  as  the  function  that  performs  division  by  2  on  a  binary 
number  by  shifting  right  one  digit.  Prove  that  the  following  program  correctly 
multiplies  two  natural  numbers.  Clearly  state  the  loop  invariant  that  you  are 
using. 

mult(n.m:  natural  numbers)  = 
result  =  0. 

While  m  *  0  do 

If  odd(in)  then  result  =  result  +  n. 
n  =  2/t. 
m  =  diviini). 

24.  Prove  that  the  following  program  computes  the  function  doublets)  where,  for  any 
string  s.  double(s)  =  True  if  s  contains  at  least  one  pair  of  adjacent  characters 
that  are  identical  and  False  otherwise.  Clearly  state  the  loop  invariant  that  you 
are  using. 

dt>uble(s :  string)  = 
found  =  False. 
for  i  =  1  to  length{s)  - 1  do 

if  s[/j  =  s[i  +  1]  [hen  found  =  True. 
return  {found). 
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The  Theory:  Working 
with  Logical  Formulas 


Boolean  formulas  describe  circuits.  First-order  logic  formulas  encode  software 
specifications  and  robot  plans.  We  need  efficient  and  correct  techniques  for 
manipulating  them.  In  this  appendix,  we  present  some  fundamental  theoretical 
results  that  serve  as  the  basis  for  such  techniques.  We'll  begin  with  Boolean  formulas. 
Then  we'll  consider  the  extension  of  some  of  the  Boolean  ideas  to  first-order  logic. 


B.1  Working  with  Boolean  Formulas:  Normal  Forms, 
Resolution  and  OBDDs 

In  this  section  we  discuss  three  issues  that  may  arise  when  working  with  Boolean 
(propositional)  formulas: 

•  conversion  of  an  arbitrary  Boolean  formula  into  a  more  restricted  form  (a  normal 
form). 

•  boolean  resolution,  a  proof  by  refutation  technique,  and 

•  efficient  manipulation  of  Boolean  formulas. 

B.1.1  Normal  Forms  for  Boolean  Logic 

Recall  that  a  normal  form  for  a  set  of  data  objects  is  a  restricted  syntactic  form  that 
simplifies  one  or  more  operations  on  the  objects.  When  we  use  the  term  “normal 
form,”  we  generally  require  that  every  object  in  the  original  set  have  some  equivalent 
(with  respect  to  the  operations  for  which  the  normal  form  will  be  used)  representation 
in  the  restricted  form. 
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In  this  section  we  define  three  important  normal  forms  for  Boolean  formulas  and 
we  prove  that  any  Boolean  formula  has  a  corresponding  formula  in  each  of  those  nor¬ 
mal  forms.  We  begin  with  some  definitions: 

A  literal  in  a  Boolean  formula  is  either  an  atomic  proposition  (a  simple  Boolean 
variable),  or  an  atomic  proposition  preceded  by  a  single  negation  symbol.  So  P ,  Q ,  and 
are  all  literals.  A  positive  literal  is  a  literal  that  is  not  preceded  by  a  negation  symbol. 

A  negative  literal  is  a  literal  that  is  preceded  by  a  negation  symbol. 

A  clause  is  either  a  single  literal  or  the  disjunction  of  two  or  more  literals.  So 
P ,  P  V  -i P,  and  P  V  ->Q  V  R  V  5  are  all  clauses. 

Conjunctive  Normal  Form 

A  well-formed  formula  (wff)  of  Boolean  logic  is  in  conjunctive  normal  form  iff  it  is 
either  a  single  clause  or  the  conjunction  of  two  or  more  clauses.  The  following  formu¬ 
las  are  in  conjunctive  normal  form. 

•  P 

•  P  V  -.0  V  R  v  S 

•  (P  V  -.0  V  R  V  S)  A  (iP  V  R ) 

Tire  following  formulas  are  not  in  conjunctive  normal  form. 

•  P->Q 

•  -,(PV-i0) 

•  (P  A  -.0  A  P  A  S)  V  (->P  A  iR) 

THEOREM  B.1  Conjunctive  Normal  Form  Theorem 

Theorem:  Given  w,  an  arbitrary  wff  of  Boolean  logic,  there  exists  a  wff  w'  that  is  in 
conjunctive  normal  form  and  that  is  equivalent  to  w. 

Proof:  The  proof  is  by  construction.  The  following  algorithm  conjunctive  Boolean 
computes  w'  given  w: 

conjunctive  Boolean^  nr.  wff  of  Boolean  logic)  = 

1.  Eliminate  — ►  and  *■*  from  u\  using  the  fact  that  P — ►  Q  is  equivalent  to 
~>PVQ. 

2.  Reduce  the  scope  of  each  -i  to  a  single  term,  using  the  facts: 

•  Double  negation:  i(-iP)  =  p. 

•  deMorgan’s  laws: 

•  -.(P  A  Q)  =  (-P  v  -iQ). 

•  -*(P  V  0)  S  A 

3.  Convert  w  to  a  conjunction  of  clauses  using  the  fact  that  both  V  and  A 
are  associative  and  the  fact  that  V  and  A  distribute  over  each  other. 
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EXAMPLE  B.1  Boolean  Conjunctive  Normal  Form 

Let  w  be  the  wff  — *->(/?  V  -Q).  Then  »r  can  be  converted  to  conjunctive  normal 
form  as  follows. 

•  Step  1  produces  ~,P  V  -,(R  V  -* Q ). 

•  Step  2  produces  -» P  V  (- A  Q). 

•  Step  3  produces  (~>P  V  ->/?)  A  {->P  V  Q). 


Conjunctive  nniinul  form  is  useful  as  a  basis  for  describing  3-conjunctive  normal 
form,  as  we  are  about  to  do.  It  is  also  important  because  its  extension  to  first-order 
logic  formulas  is  useful,  as  we'll  see  below,  in  a  variety  of  applications  that  require  au¬ 
tomatic  theorem  proving. 

3-Conjunctive  Normal  Form 

A  well-formed  formula  (wff)  of  Boolean  logic  is  in  3-conjunctive  normal  form  {3- 
CNF)  iff  it  is  in  conjunctive  normal  form  and  each  clause  contains  exactly  three  literals. 
So  the  following  formulas  are  in  3-conjunctive  normal  form: 

•  (-tQ  V  R  V  S). 

•  (-» Q  V  R  V  S)  A  (->P  V  ->R  V  -<Q). 

3-conjunctive  normal  form  is  important  because  it  allows  us  to  define  3-SAT  =  {u* :  w 
is  a  wff  in  Boolean  logic,  tc  is  in  3-conjunctive  normal  form  and  ir  is  satisfiable }.  3-SAT  is 
important  because  it  is  NP-eompleie  and  reduction  from  it  can  often  be  used  to  show  that 
a  new  language  is  also  NP-complete. 

THEOREM  B.2  3-Conjunctive  Normal  Form  Theorem 

Theorem:  Given  a  Boolean  wff  tr  in  conjunctive  normal  form,  there  exists  an  algo¬ 
rithm  that  constructs  a  new  wff  tc'  that  is  in  3-conjunctive  normal  form  and  that 
is  satisfiable  iff  tc  is. 

Proof:  The  following  algorithm  3-conjttnclivclionlcan  computes  tr’  given  tc: 

3-umjunctiveBoolean(  tc:  wff  in  conjunctive  normal  form )  = 

1.  If.  in  n\ there  are  any  clauses  with  more  than  three  literals,  split  them  apart, 
add  additional  variables  as  necessary. and  form  a  conjunction  of  the  result¬ 
ing  clauses.  Specifically,  if  n  >  3  and  there  is  a  clause  of  the  following  form: 

(/,  V  /2  V  l\  V  ...  v  /„). 

then  it  will  be  replaced  by  the  following  conjunction  of  n  -  2  clauses 
that  can  be  constructed  by  introducing  a  set  of  literals  Z,  -  Z„_3  ^at 
do  not  otherwise  occur  in  the  formula: 

(/,  v/;vZ,)  A  (-Z,  v  l\  v  /.2)  a  ...A  (Z._3  V/„_|  V/„) 
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2.  If  there  is  any  clause  with  only  one  or  two  literals,  replicate  one  of  those 
literals  once  or  twice  so  that  there  is  a  total  of  three  literals  in  the  clause. 

In  Exercise  2K.4.  we  prove  that  w'  is  sutisfiablc  iff  U'  is.  We  also  prove  that  3- 
conjunctive Boolean  runs  in  polynomial  time. 


EXAMPLE  B.2  Boolean  3-Conjunctive  Normal  Form 

Let  w  be  the  wff  (— «/»  V  -<R)  A  (->P  V  Q  V  R  V  S).  We  build  the  3-conjunctive 
normal  form  wff  w'  as  follows: 

•  The  first  clause  can  be  rewritten  as  (-<P  V  ->R  V  ->R). 

•  The  second  clause  can  be  rewritten  as  (iP  V  Q  V  Z{)  A  ( -<Zl  V  R  V  S). 

So  the  following  formula  w'  is  satisfiable  iff  w'  is: 

(-.P  V  -.R  V  -i  R)  A  (nP  V  Q  V  Z,)  A  (-Z,  M  RMS). 

Disjunctive  Normal  Form 

We  now  consider  an  alternative  normal  form  in  which  conjunctions  of  literals  are  con¬ 
nected  by  disjunction  (rather  than  the  other  way  around).  A  well-formed  formula  (wff) 
of  Boolean  logic  is  in  disjunctive  normal  form  iff  it  is  the  disjunction  of  one  or  more 
disju nets,  each  of  which  is  either  a  single  literal  or  the  conjunction  of  two  or  more  liter¬ 
als.  All  of  P,  -»P  A  ift,  and  P  A  ~>Q  A  R  A  S  are  disjuncts,and  all  of  the  following  for¬ 
mulas  are  in  disjunctive  normal  form. 

•  P 

•  P  V  -(?  V  R  V  s 

•  (P  A  -Q  a  R  A  .S)  V  (iP  A  -i/?) 


Disjunctive  normal  form  is  the  basis  for  a  convenient  notation  for  writing 
queries  against  relational  databases.  (H.5) 


THEOREM  B.3  Disjunctive  Normal  Form  Theorem 

Theorem:  Given  u\  an  arbitrary  wff  of  Boolean  logic,  there  exists  a  wff  w'  that  is  in 
disjunctive  normal  form  and  that  is  equivalent  to  u\ 

Proof:  The  proof  is  by  a  construction  similar  to  the  one  used  to  prove  Theorem 
B.l.The  following  algorithm  disjunctive  Boolean  computes  ir’  given  U". 

disjunctive Boolcun(w\  wff  of  Boolean  logic)  = 

1.  Eliminate  -*  and  «-►  from  ?<\  using  the  fact  that  P-*Q  is  equivalent  to 
-iP  v  Q. 
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2.  Reduce  the  scope  of  each  -» in  w  to  a  single  atomic  proposition  using  the 
facts: 

•  Double  negation:  ->{-<P)  —  P. 

•  de Morgan's  laws: 

•  ~^(P  A  Q)  *  {pP  V  ->{?). 

•  ^{P  V  Q)  *  KA-^). 

3.  Convert  w  to  a  disjunction  of  disjuncts  using  the  fact  that  both  V  and 
A  are  associative  and  the  fact  that  V  and  A  distribute  over  each  other. 


EXAMPLE  B.3  Boolean  Disjunctive  Normal  Form 

Let  w  be  the  wff  P  A  {Q—> ~'(R  A  T)).  Then  w  can  be  converted  to  disjunctive 
normal  form  as  follows. 

•  Step  l  produces:  P  A  ( ~Q  V  -i (/?  A  7')). 

•  Step  2  produces:  P  A  (~<Q  V  ~>R  V  -’7'). 

•  Step  3  produces:  (P  A  ~^Q)  V(PA  V  (P  A  -»7'). 


B.1.2  Boolean  Resolution 

Two  of  the  most  important  operations  on  Boolean  formulas  are: 

1.  Satisfiability  checking :  Given  a  wff  ST.  is  it  satisfiablc  or  not?  Recall  that  a  wff  is 
salisfiable  iff  it  is  true  for  at  least  one  assignment  of  truth  values  to  the  variables 
it  contains. 

2.  Theorem  proving:  Given  a  set  of  axioms  A  and  a  wff  ST.  does  A  entail  ST1  Recall 
that  A  entails  ST  iff,  whenever  all  of  the  wffs  in  A  are  true,  ST  is  also  true. 

But  note  that  A  entails  ST  iff  A  A  ->ST  is  unsat isfiablc.  So  an  algorithm  for  deter¬ 
mining  unsatisfiability  can  also  be  exploited  as  a  theorem  proven  The  technique  that 
we  present  next  is  significant  not  just  because  it  can  be  used  to  reason  about  Boolean 
logic  formulas.  More  importantly,  its  extension  to  first-order  logic  launched  the  field  of 
automatic  theorem  proving. 

Resolution:  The  Inference  Rule 

The  name  ‘  resolution’'  is  used  both  for  an  inference  rule  and  a  theorem-proving  tech¬ 
nique  that  is  based  on  that  rule.  We  first  describe  the  inference  rule.  Let  Q,  iQ,  p  and 
R  be  wffs.  Then  define: 

Resolution:  From  the  premises:  (P  V  Q)  and  (R  V  -'Q). 

Conclude:  (P  v  R). 
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The  soundness  of  the  resolution  rule  is  based  on  the  following  observation:  Assume 
that  both  (P  V  (?)  and  (R  V  -Q)  are  Trwe.Then : 

•  if  Q  is  True .  R  must  be  True. 

•  if  ~<Q  True ,  P  must  be  True. 

Since  either  Q  or  -<(?  must  be  True ,  P  V  /?  must  be  True.  To  prove  resolution’s  sound¬ 
ness,  it  suffices  to  write  out  its  truth  table.  We  leave  that  as  an  exercise. 

Resolution:  The  Algorithm 

We  next  present  a  theorem-proving  technique  called  resolution.  It  relies  on  the  infer¬ 
ence  rule  called  resolution  that  we  just  defined.  The  core  of  the  prover  is  an  algorithm 
that  detects  unsatisfiability.  So  a  resolution  proof  is  a  proof  by  contradiction  (often 
called  refutation).  A  resolution  proof  of  a  statement  ST ,  given  a  set  of  axioms  A ,  is  a 
demonstration  that  A  A  ST  is  unsatisfiable.  If  ST  cannot  be  true  given  A,  ST  must  be. 

The  resolution  procedure  takes  as  its  input  a  list  of  clauses.  So,  before  it  can  be  used, 
we  must  convert  the  axioms  in  A  to  such  a  list,  as  follows. 

1.  Convert  each  formula  in  A  to  conjunctive  normal  form. 

2.  Build  L,  a  list  of  the  clauses  that  are  constructed  in  step  1. 

EXAMPLE  B.4  Making  a  List  of  Clauses 

Suppose  that  we  are  given  the  set  A  of  axioms  as  shown  in  column  1.  We  convert 
each  axiom  to  conjunctive  normal  form,  as  shown  in  the  second  column. 


Given  Axioms 

Converted  to  Conjunctive  Normal  Form 

P 

P 

(PA<?)-*/? 

-/’V-^VK 

(SvT)  — Q 

{S  V  Q)  A  (~>T  V  Q) 

T 

T 

Then  the  list  L  of  clauses  constructed  in  this  process  is:  P,  ->P  V  ->Q  V 
/?,  S  V  (?,  -T  V  (?.  and  T. 


To  prove  that  a  formula  ST  is  entailed  by  A,  we  construct  the  formula  ST,  convert 
it  to  conjunctive  normal  form,  and  add  all  of  the  resulting  clauses  to  the  list  of  clauses 
produced  from  A. Then  resolution,  which  we  describe  next,  can  begin. 

A  pair  of  complementary  literals  is  a  pair  of  literals  that  are  not  mutually  satisfiable. 
So  two  literals  are  complementary  iff  one  is  positive,  one  is  negative,  and  they  contain 
the  same  propositional  symbol.  For  example,  Q  and  --Q  are  complementary  literals. 
We  ll  say  that  two  clauses  C|  and  C2  contain  a  pair  of  complementary  literals  iff  C j  con¬ 
tains  one  element  of  the  pair  and  C2  contains  the  other.  For  example,  the  clauses 
(P  v  Q  V  S)  and  (T  V  ~^Q)  contain  the  complementary  literals  Q  and  -*(?. 
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Consider  a  pair  of  clauses  that  contain  a  pair  of  complementary  literals,  which,  without 
loss  of  generality,  we'll  call  Q  and  -i Q.  So  we  might  have  Ct  =  R\  V  Ry  V  •  •  ■  V  Rj  V  Q 
and  C2  =  S|  V  S2  V  ...  V  Sk  V  ~Q.  Given  C|  and  C2.  resolution  (the  inference  rule)  al¬ 
lows  us  to  conclude  /?,  V  R2  V . . .  V  Rf  V  S,  V  S2  V  .  •  •  V  Sk.  When  we  apply  the  resolu¬ 
tion  rule  in  this  way,  we'll  say  that  we  have  resolved  the  parents ,  C t  and  C2.  to  generate  a 
new  clause,  which  we'll  call  the  resolvent. 

The  resolution  algorithm  proceeds  in  a  sequence  of  steps.  At  each  step  it  chooses 
from  L  two  clauses  that  contain  complementary  literals.  It  resolves  those  two  clauses 
together  to  create  a  new  clause,  the  resolvent,  which  it  adds  to  L.  If  any  step  generates 
an  unsatisfiable  clause,  then  a  contradiction  has  been  found.  For  historical  reasons,  the 
empty  clause  is  commonly  called  nil ,  the  name  given  to  an  empty  list  in  Lisp .  the  lan¬ 
guage  in  which  many  resolution  provers  have  been  built.  The  empty  clause  is  unsatisfi¬ 
able  since  it  contains  no  literals  that  can  be  made  True.  So  if  it  is  ever  generated,  the 
resolution  procedure  halts  and  reports  that,  since  adding  ->.ST  to  A  has  led  to  a  contra¬ 
diction.  ST  is  a  theorem  given  A. 


We'll  describe  Lisp  and  illustrate  its  use  for  symbolic  reasoning,  including 
theorem  proving,  in  G.5. 


We  can  stale  the  algorithm  as  follows: 

resolve- Boolean( A :  set  of  axioms  in  conjunctive  normal  form.  ST:  a  wff  to  be 
proven)  = 

1.  Construct  L .  the  list  of  clauses  from  A. 

2.  Negate  ST.  convert  the  result  to  conjunctive  normal  form,  and  add  the  result¬ 
ing  clauses  to  L. 

3.  Until  either  the  empty  clause  [nil)  is  generated  or  no  progress  is  being 
made  do: 

3.1.  Choose  from  L  two  clauses  that  contain  a  pair  of  complementary  literals. 
Call  them  the  parent  clauses. 

3.2.  Resolve  the  parent  clauses  together.  The  resulting  clause,  called  the  re¬ 
solvent.  will  be  the  disjunction  of  all  the  literals  in  both  parent  clauses 
except  for  one  pair  of  complementary  literals. 

3.3.  If  the  resolvent  is  not  nil  and  is  not  already  in  L,  add  it  to  L. 

4.  If  nil  was  generated,  a  contradiction  has  been  found.  Return  success.  ST  must 
be  true. 

5.  If  nil  was  not  generated  and  there  was  nothing  left  to  do.  return  failure. 


EXAMPLE  B.5  Boolean  Resolution 

Given  the  axioms  that  we  presented  in  Example  B.4,  prove  R. The  axioms,  and  the 
clauses  they  generate  are  shown  in  the  following  table. 
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Given  Axioms 

Generate  the  Clauses 

P 

P 

(/>AQ)-*K 

^P  V  V  R 

(5  V  T)  — ►  @ 

-iS  mq 

T 

T 

We  negate  R.  The  result  is  already  in  conjunctive  normal  form,  so  we  simply 
add  it  to  the  list  of  clauses: 


We  illustrate  the  resolution  process  by  connecting  each  pair  of  parent  clauses 
to  the  resolvent  that  they  produce. 


In  the  simple  example  that  we  just  did,  Resolve-Boolean  found  a  proof  without  try¬ 
ing  any  unnecessary  steps.  In  general,  though,  it  conducts  a  search  through  a  space  of 
possible  resolvents.  Us  efficiency  can  be  affected  by  the  choice  of  parents  in  step  3.1.  In 
particular,  the  following  strategies  may  be  useful. 


•  Unit  preference .  All  other  things  being  equal,  choose  one  parent  that  consists  of  just 
a  single  clause.  Then  the  resolvent  will  be  one  clause  shorter  than  the  other  parent 
and  thus  one  clause  closer  to  being  the  empty  clause. 

•  Sel-of-supporr.  Begin  by  identifying  some  subset  S  of  L  with  the  property  that  we  can 
prove  that  any  contradiction  must  rely  on  at  least  one  clause  from  S.  For  example,  if  we 
assume  that  the  set  of  axioms  is  consistent,  then  every  contradiction  must  rely  on  at 
least  one  clause  from  ~^ST.  So  we  could  choose  S  to  be  just  the  clauses  in  ->ST.  Then,  in 
every  resolut.on  step,  choose  at  least  one  parent  from  S  and  then  add  the  resolvent  to  S. 


Resolve- Boolean's  efficiency  can  also  be  affected  by  optimizing  step  3.3.  One  way  to 
do  t  at  is  ase  on  e  o  servation  that,  if  the  resolvent  is  subsumed  by  some  clause  al- 

rCa  I'k  Th  p  Puls  Process  no  closer  to  finding  a  contradiction.  It  should 
simply  be  discarded.  For  example,  if  P  is  already  in  L,  it  makes  no  sense  to  add  P  V  P 
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or  P  V  Q.  At  the  extreme,  if  the  resolvent  is  a  tautology,  it  is  subsumed  by  everything. 
So  adding  it  to  L  puts  the  process  no  closer  to  finding  a  contradiction.  It  should  simply 
be  discarded.  For  example,  it  never  makes  sense  to  add  a  clause  such  as  P  V  ->P. 

It  is  possible  to  prove  that  the  procedure  resolve- Boolean  is  sound.  It  is  also  possible 
to  prove  that,  as  long  as  resolve- Boolean  systematically  explores  the  entire  space  of 
possible  resolutions,  it  is  refutation-complete.  By  that  we  mean  that  if  A  A  ->ST  is  un- 
satisfiable,  resolve- Boolean  will  generate  nil  and  thus  discover  the  contradiction. 

But  it  is  important  to  keep  in  mind  the  complexity  results  that  we  present  in 
Chapter  28.  We  prove,  as  Theorem  28. 16,  that  the  language  SAT  =  { <w>  :  w  is  a  wff 
in  Boolean  logic  and  w  is  satisfiable)  is  NP-complete.  No  polynomial-time  algorithm 
for  deciding  it  is  known  and  it  is  unlikely  that  one  exists.  Unsatisfiability  checking  ap¬ 
pears  to  be  even  harder  since  unsatisfiabilily,  unlike  satisfiability,  cannot  be  verified 
just  by  checking  one  set  of  assignments  of  values  to  the  propositional  symbols.  As  we 
see  in  Section  28.7,  the  language  UNSAT  =  { <  w>  :  w  is  a  wff  in  Boolean  logic  and  w 
is  not  satisfiable}  is  in  co-NP  (i.e.,  it  is  the  complement  of  a  language  in  NP).  But  it  is 
thought  not  to  be  in  NP,  much  less  in  P.  There  are  ways  to  improve  the  performance  of 
resolve- Boolean  in  many  cases.  But,  in  the  worst  case,  the  time  it  requires  grows  expo¬ 
nentially  with  the  number  of  clauses  in  A  A  ->ST. 


B.1.3  Efficient  SAT  Solvers  and  Ordered  Binary  Decision  Diagrams 

Satisfiability  checking  plays  an  important  role  in  many  applications  a.  including  the 
design  and  analysis  of  digital  circuits,  the  use  of  model  checking  to  verify  properties  of 
programs,  and  the  planning  algorithms  that  determine  the  behavior  of  robots  and  other 
intelligent  systems.  While  solving  SAT  in  the  general  case  remains  hard,  substantial  re¬ 
search  on  the  development  of  efficient  satisfiability  checkers  (or  SAT  solvers)  has  led 
to  the  development  of  practical  systems  that  work  very  well.  In  this  section,  we’ll  de¬ 
scribe  one  technique  that  plays  an  important  role  in  many  efficient  SAT  solvers.  What 
we’ll  do  is  to  describe  a  new  normal  form  for  Boolean  formulas.  Its  advantage  is  that  it 
often  produces  a  compact  representation  that  can  be  exploited  efficiently. 

For  many  applications,  we  will  find  it  useful  to  think  of  a  Boolean  formula  P  as  a 
Boolean  function  of  its  inputs,  so  wc’U  use  that  notation  in  the  rest  of  this  example, 
rather  than  the  wff  notation  that  we  introduced  in  A.  1.1.  So  let  /'be  a  Boolean  function 
of  any  number  of  variables.  We'll  encode  True  as  1  and  False  as  0. 

One  straightforward  way  to  represent  /  is  as  a  truth  table,  as  we  did  in  A. 1.1.  An  al¬ 
ternative  is  as  an  ordered  binary  decision  tree.  In  any  such  tree,  each  non-terminal 
node  corresponds  to  a  variable  and  each  terminal  node  corresponds  to  the  output  of 
the  function  along  the  path  that  reached  that  node.  From  each  nonterminal  node  there 
are  two  edges,  one  (which  we’ll  draw  to  the  left,  with  a  dashed  line)  corresponds  to  the 
case  where  the  value  of  the  variable  at  the  parent  node  is  0.  The  other  (which  we'U 
draw  to  the  right,  with  a  solid  line)  corresponds  to  the  case  where  the  value  of  the  vari¬ 
able  at  the  parent  node  is  l.To  define  such  a  tree  for  a  binary  function/,  we  begin  by 
defining  a  total  ordering  (yt  <  t)2  <  *  •  •  <  v„)  on  the  n  variables  that  represent  the  in¬ 
puts  to  /.  Any  ordering  will  work,  but  the  efficiency  of  the  modified  structure  that  we 
will  present  below  may  depend  on  the  ordering  that  has  been  chosen.  Given  an  order¬ 
ing.  we  can  draw  the  tree  with  V\  at  the  root,  v2  at  the  next  level,  and  so  forth. 
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As  an  example,  consider  Ihe  function  a-2.  a*3)  =  V  .v2)  A  x3.  We  can  repre¬ 
sent  /,.  as  shown  in  Figure  B.l,  as  either  a  truth  table  or  as  a  binary  decision  tree 
(where  the  tree  is  built  using  the  variable  ordering  (.vt  <  jc2  <  .r3). 

The  size  of  both  the  truth  table  representation  and  the  binary  decision  tree  for  a 
function  /of  n  variables  is  C>(2'').  Any  program  M  that  reasons  about /by  manipulating 
either  of  those  representations  (assuming  it  must  consider  all  of  f)  will  consume  at  least 
0(2")  space  and  0(2")  lime.  So  timereq(M)  e  il(2")  and  spacereq(M)  e  fl(2").  If  we 
could  reduce  the  size  of  the  representation,  it  might  be  possible  to  reduce  both  the  time 
and  space  requirements  of  any  program  that  uses  it. 

If  we  choose  the  decision  tree  representation,  it  is  often  possible  to  perform  such  a 
reduction.  We  can  convert  the  tree  into  a  directed  acyclic  graph,  called  an  ordered  bi¬ 
nary  decision  diagram  or  OBDD  a.  OBDDs.  along  with  algorithms  for  manipulating 
them,  were  introduced  in  [Bryant  1986].  Our  discussion  of  them  is  modeled  after 
[Bryant  1992]  and  the  examples  we  show  here  were  taken  from  that  paper. 

We  can  optimize  an  OBDD  by  guaranteeing  that  none  of  its  subtrees  occurs  more 
than  once.  Starting  from  the  bottom,  we  will  collapse  all  instances  of  a  duplicate  sub¬ 
tree  into  a  single  one.  We’ll  then  adjust  the  links  into  that  unique  tree  appropriately. 
So  we  begin  by  creating  only  two  terminal  nodes,  one  labeled  0  and  the  other  labeled  1. 
Then  we’ll  move  upward  collapsing  subtrees  whenever  possible.  In  the  tree  we  just 
drew  for  / j.  for  example,  observe  that  the  subtree  whose  root  is  jc3,  whose  left  branch 
is  the  terminal  0  and  whose  right  branch  is  the  terminal  1  occurs  three  times.  So  the 
three  can  be  collapsed  into  one.  After  collapsing  them,  we  get  the  diagram  shown  in 
Figure  B.2(a).  At  this  point,  notice  the  two  nodes  shown  with  double  circles.  Each  of 
them  has  the  property  that  its  two  outgoing  edges  both  go  to  the  same  place.  In 
essence,  the  value  of  the  variable  at  that  node  has  no  effect  on  the  value  that  f\  re¬ 
turns.  So  the  node  itself  can  be  eliminated.  Doing  that,  we  get  the  diagram  shown  in 
Figure  B.2(b). 

The  process  we  just  described  can  be  executed  by  the  following  function 
crealeOBDD: 

crealeOBDD fronuree(d :  ordered  binary  decision  tree)  = 

1.  Eliminate  redundant  terminal  nodes  by  creating  a  single  node  for  each  label 
and  redirecting  edges  as  necessary. 

2.  Until  one  pass  is  made  during  which  no  reductions  occurred  do: 
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FIGURE  B.1 


Representing  a  function  as  a  truth  table 


or  a  binary  decision  tree. 
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FIGURE  B.2  Collapsing  nodes  lo  get  an  efficient 
<b>  OBDD. 

2.1.  Eliminate  redundant  nonterminal  nodes  (i.c..  duplicated  subtrees)  by 
collapsing  them  into  one  and  redirecting  edges  as  necessary. 

2.2.  Eliminate  redundant  tests  by  erasing  any  node  whose  two  output  edges 
go  to  the  same  place.  Redirect  edges  as  necessary. 

This  process  will  create  a  maximally  reduced  OBDD.  by  which  wc  mean  that  there 
is  no  smaller  one  that  describes  the  same  function  and  that  considers  the  variables  in 
the  same  order.  It  is  common  to  reserve  the  term  OBDD  for  such  maximally  reduced 
structures.  Given  a  particular  ordering  ( <-’j  <  ts  <  •  •  •  <  v„)  on  the  n  variables  that  rep¬ 
resent  the  inputs  to  some  function  /,  any  two  OBDDs  for  /'will  be  isomorphic  to  each 
other  (i.e..  the  OBDD  for /is  unique  up  to  the  order  in  which  the  edges  are  drawn). 
Thus  the  OBDD  structure  is  a  canonical  form  for  the  representation  of  Boolean  func¬ 
tions.  giver  a  particular  variable  ordering. 

Since  the  OBDD  for  a  function  is  unique  up  to  isomorphism. some  operations  on  it 
can  be  performed  in  constant  time.  For  example,  a  function  /corresponds  to  a  valid  wff 
(i.e..  one  that  is  a  tautology)  iff  its  OBDD  is  identical  to  the  one  shown  in  Figure 
B.3(a).  A  function /corresponds  to  a  satisfiable  wff  iff  its  OBDD  is  not  identical  to  the 
one  shown  in  Figure  B.3(b). 

If  the  only  way  to  build  a  reduced  OBDD  were  to  start  with  a  decision  tree  and  re¬ 
duce  it  by  applying  createOBDDfromtree,  it  would  not  be  practical  to  work  with  func¬ 
tions  of  a  large  number  of  variables,  even  if  the  reduced  OBDD  were  of  manageable 
size.  Fortunately,  it  is  possible  O  to  build  a  reduced  OBDD  directly,  without  starting 
with  a  complete  decision  tree. 
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(a) 


H 

(h)  FIGURE  B.3  Exploiting  canonical  forms. 

The  size  of  the  OBDD  that  can  be  built  for  a  function / may  depend  critically  on  the 
order  that  we  impose  on  /" s  inputs.  For  example,  in  the  original  decision  tree  that  we 
built  for  / 1  above,  we  considered  the  inputs  in  the  order  xx.x2,  Xy.  We  could  have  pro¬ 
duced  a  slightly  smaller  OBDD  (one  with  one  fewer  edge)  if  we  had  instead  used  the 
order  Xy,  jcj.  x2.  We  leave  doing  that  as  an  exercise. 

In  some  cases  though,  the  effect  of  variable  ordering  is  much  more  significant.  Par¬ 
ticularly  in  many  cases  of  practical  interest,  in  which  there  are  systematic  relationships 
within  clusters  of  variables,  it  is  possible  to  build  a  maximally  reduced  OBDD  that  is 
substantially  smaller  than  the  original  decision  tree.  Consider  the  Boolean  function: 


b{,  a2 .  b2,  by)  =  (a,  A  bx)  V  (a2  A  b2)  V  (a3  A  by). 

WeMI  consider  two  different  variable  orderings  and  the  OBDDs  that  can  be  created 
for  them. The  first  ordering,  shown  in  Figure  B.4(a),  respects  the  relationship  between 
the  a,  b  pairs. The  second,  shown  in  Figure  B.4(b),  does  not  and  pays  a  price. 


(<ii  <  <  o;  <  h  <  ay  <  ft,).  (o,  <  a2  <  flj  <  fr,  <  t>2  <  b}). 


FIGURE  B.4  The  order  of  the  variables  matters. 
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Fortunately,  for  many  classes  of  important  problems  there  exist  heuristics  9  that 
find  the  variable  orderings  that  make  small  structures  possible.  Unfortunately,  however, 
there  are  problems  for  which  no  small  OBDD  exists.  For  example,  consider  a  circuit  that 
implements  binary  multiplication.  Let  /be  the  Boolean  function  corresponding  to  ei¬ 
ther  of  the  two  middle  digits  of  the  result  of  an  n-bil  multiplication.  The  size  of  any 
OBDD  for /grows  exponentially  with  n. 

Programs  that  solve  problems  for  which  small  OBDDs  exist  may  have  manageable 
requirements  for  both  time  and  space.  In  particular,  it  is  known  that  most  common 
operations  on  OBDDs  can  be  done  in  time  that  is  O(nin).  where  n  and  m  are  sizes  of 
the  input  OBDDs.  So  the  OBDD  structure  improves  the  expected  performance  (with 
respect  to  both  time  and  space)  of  many  algorithms  on  many  practical  problems. 


Model  checkers  based  on  OBDDs  are  routinely  used  to  prove  properties  of 
systems  whose  state  description  contains  Ml2"  stales.  (H.  1.2) 


However,  because  small  OBDDs  do  not  exist  for  all  problems,  the  structure  does  not 
change  the  worst-case  complexity  of  those  problems. Theorem  2K.5.2  (the  Cook-Levin 
theorem)  tells  us  that  Boolean  satisfiability  is  NP-complete.  No  polynomial  algorithm  for 
solving  it  for  all  cases  is  known.  So,  if  we  can  impose  no  constraints  on  the  form  of  the 
input,  the  w'orsl-case  time  complexity  of  any  algorithm  is  likely  to  he  0(2").  While  there 
is  no  proof  that  it  is  not  possible  to  do  better  than  that,  it  appears  unlikely  that  we  can. 


B.2  Working  with  First-Order  Formulas:  Clause  Form  and 
Resolution 

We  can  extend  to  first-order  logic  (FOL)  the  normal  forms  and  the  resolution  theorem- 
proving  procedure  that  wc  defined  for  Boolean  logic  in  the  last  section. 


B.2.1  Clause  Form 

Suppose  that  wc  want  to  build  a  first-order  logic  theorem  prover  that  we  can  use  as  the 
basis  for  a  practical  reasoning  system.  One  of  the  first  things  that  we  observe  is  that  the 
slundard  first-order  language  (the  one  that  we  defined  in  A.  1.2)  allows  quantifiers  and 
connectors  to  he  embedded  in  arbitrary  ways. 

EXAMPLE  B.6  A  Fact  About  Marcus 

Consider  the  following  sentence  F: 

V.v  ((Ronum(x)  A  Know(x ,  Marcus))—* 

(Hale(xy  Caesar )  v  Vy  (3z  ( Haie(y .  z))  —  Thmkcruzyix ,  y)))). 
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F says  that  any  Roman  who  knows  Marcus  either  hates  Caesar  or  thinks  that  anyone 
who  hales  anyone  is  crazy.  So  if  we  knew  that  Paulus  was  a  Roman  who  knew  Mar¬ 
cus  and  who  didn’t  hate  Caesar,  we  could  use  F  to  conclude  that  Paulus  thinks  that 
anyone  who  hates  anyone  is  crazy.  Or,  if  we  knew  that  Paulus  was  a  Roman  who 
knew  Marcus,  and  that  Augustus  hates  Flavius  but  Paulus  doesn’t  think  Augustus  is 
crazy,  then  we  could  use  Fto  conclude  that  Paulus  hates  Caesar.  Or,  if  we  knew  that 
Paulus  knows  Marcus,  doesn’t  hate  Caesar,  and  doesn’t  think  that  Augustus,  who 
hates  Flavius  is  crazy,  then  we  could  use  F  to  conclude  that  Paulus  is  not  a  Roman. 

Each  of  the  arguments  that  we  have  just  described  requires  a  different  way  of 
matching  the  other  facts  we  already  know  against  the  fact  about  Marcus's  friends. 
We'd  like  one  technique  that  works  for  all  of  them. 


One  approach  to  solving  this  problem  is  to  exploit  the  idea  of  a  normal  form,  just  as 
we  did  in  dealing  with  Boolean  logic  formulas.  In  particular,  we  can  extend  the  notions 
of  conjunctive  and  disjunctive  normal  forms,  to  first-order  logic.  Now  we  must  be  con¬ 
cerned  both  with  the  structure  of  the  logical  connectors  (just  as  we  were  for  Boolean 
logic)  as  well  as  the  structure  of  the  quantifiers  and  variables.  The  motivation  for  the 
definition  of  the  normal  forms  we  are  about  to  describe  is  the  need  to  build  theorem¬ 
proving  programs.  The  syntax  for  an  arbitrary  sentence  in  first-order  logic  allows  a 
great  deal  of  flexibility,  making  it  hard  to  write  programs  that  can  reason  with  all  the 
facts  that  they  may  be  given. 

A  sentence  in  first-order  logic  is  in  prenex  normal  form  iff  it  is  of  the  form: 

<quantifier  list  ><matrix> , 

where  <qnanlifier  lisi>  is  a  list  of  quantified  variables  and  <matrix>  is  quantifier-free. 


EXAMPLE  B.7  Prenex  Normal  Form 

Vx  (3 y  ((P(.t)  A  (?(y))  — »  Vz  (/?(.v,  y,  z)))  is  not  in  prenex  normal  form. 
V.v3yVz  ( P(x )  A  (?(y)  R  (.v,  y,  z))  is  in  prenex  normal  form.  Its  matrix 

is  (P(.t)  A  <?(>’)  -+  R  ( x ,  y,  z)). 


Any  sentence  can  be  converted  to  an  equivalent  sentence  in  prenex  normal  form  by 
the  following  procedure. 

1.  If  necessary,  rename  the  variables  so  that  each  quantifier  binds  a  lexically 
dislinct  variable. 

2.  Move  all  the  quantifiers  to  the  left,  without  changing  their  relative  order. 

We  detine  the  teims  literal,  clause,  and  conjunctive  normal  form  for  sentences  in 
first-order  logic  analogously  to  the  way  they  were  defined  for  Boolean  logic: 
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•  A  literal  is  either  a  single  predicate  symbol,  along  with  its  argument  list,  or  it  is  such 
a  predicate  preceded  by  a  single  negation  symbol.  So  P(x,J\y ))  and  -«(?fx,/fy),  2) 
are  literals.  A  positive  literal  is  a  literal  that  is  not  preceded  by  a  negation  symbol. 
A  negative  literal  is  a  literal  that  is  preceded  by  a  negation  symbol. 

•  A  clause  is  either  a  single  literal  or  the  disjunction  of  two  or  more  literals. 

•  A  sentence  in  first-order  logic  is  in  conjunctive  normal  form  iff  its  matrix  is  either 
a  single  clause  or  the  conjunction  of  two  or  more  clauses. 

A  ground  instance  is  a  first-order  logic  expression  that  contains  no  variables.  So,  for  ex¬ 
ample.  Major-of[Sutuly .  Math)  is  a  ground  instance,  but  Vx  ( 3y  (( Major-of(x ,  y)))  is  not. 
A  sentence  in  first-order  logic  is  in  clause  form  iff: 

•  it  has  been  converted  to  prenex  normal  form, 

•  its  quantifier  list  contains  only  universal  quantifiers. 

•  its  quantifier  list  is  no  longer  explicitly  represented. 

•  it  is  in  conjunctive  normal  form,  and 

•  there  are  no  variable  names  that  appear  in  more  than  one  clause. This  last  condition 
is  important  because  there  will  no  longer  be  explicit  quantifiers  to  delimit  the  scope 
of  the  variables. The  only  way  to  tell  one  variable  from  another  will  be  by  their  names. 

EXAMPLE  B.8  Clause  Form 

The  following  sentence  is  not  in  clause  form: 

Vx  (P(x)  — *  Q(x))  A  V>'  (5(y)). 

When  it  is  converted  to  prenex  normal  form,  we  get: 

Vx  Vy  (P(x)  — *  Q(x))  A  S(y). 

Then,  when  it  is  converted  to  clause  form,  we  get  the  conjunction  of  two 
clauses: 

(-P(x)  V  <2(x))  A  S(y). 


We  are  going  to  use  clause  form  as  the  basis  for  a  first-order,  resolution-based  proof 
procedure  analogous  to  the  Boolean  procedure  that  we  defined  in  the  last  section.  To 
do  that,  we  need  to  be  able  to  convert  an  arbitrary  first-order  sentence  w  into  a  new 
sentence  w'  such  that  «»'  is  in  clause  form  and  if’  is  unsatisliable  iff  if  is.  In  the  proof 
of  the  next  theorem,  we  provide  an  algorithm  that  does  this  conversion.  All  of  the  steps 
are  straightforward  except  the  one  that  eliminates  existential  quantifiers,  so  we'll  dis¬ 
cuss  it  briefly  before  we  present  the  algorithm. 

Let  Mother-ofiy,  x )  be  true  whenever  y  is  the  mother  of  x.  Consider  the  sentence 
Vx  ( 3 y  ((Mother-of( y.  x))).  Everyone  has  a  mother.  There  is  not  a  single  individual 
who  is  the  mother  of  everyone.  But,  given  a  value  for  x.  some  mother  exists.  We  can 
eliminate  the  existentially  quantified  variable  y  using  a  technique  called  Skolemization 
based  on  an  idea  due  toThoralf  Skolcni  [Skolcm  l‘J2Kj.  We  replace  y  by  a  function  of  x. 
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We  know  nothing  about  that  function.  (Wc  don’t  know,  for  example,  that  it  is  com¬ 
putable.)  We  know  only  that  it  exists.  So  we  assign  the  function  a  name  that  is  not  al¬ 
ready  being  used,  say  / j.  Then: 

V.v  (By  (( Mothcr-of{y ,  x)))  becomes  Vx  {(Mother-of(f  i(x),  x)). 

Multiple  existential  quantifiers  in  the  same  sentence  can  be  handled  similarly,  but  it 
is  important  that  a  new  function  be  created  for  each  existential  quantifier.  For  exam¬ 
ple,  consider  the  predicate  Student-data(x.y,  z)  that  is  true  iff  student  x  enrolled  at  date 
y  and  has  major  z.Then: 

V.v  (3y  (Be  ( Student-data(x ,  y,  z))))  becomes  V.v  ( Student-data(x .  f  2(x),  /3(.v))). 

The  function  f  2(.v)  produces  a  date  and  the  function  /■*(. v)  produces  a  major.  Now 
consider  the  predicate  Suni(x,y,z),  that  is  true  iff  the  sum  of  x  and  y  is  z.Then: 

V.v  (Vy  (3z  (Sum(x,  y,  z))))  becomes  V.v  (Vy  ( Sum(x ,  y,  /4(x,  y)))). 

In  this  case,  the  value  of  z  that  must  exist  (and  be  produced  by  /4)  is  a  function  of 
both  x  and  v.  More  generally,  if  an  existentially  quantified  variable  occurs  inside  the 
scope  of  n  universally  quantified  variables,  it  can  be  replaced  by  a  function  of  n  argu¬ 
ments  corresponding  to  those  n  variables.  In  the  simple  case  in  which  an  existentially 
quantified  variable  occurs  inside  the  scope  of  no  universally  quantified  variables,  it  can 
be  replaced  by  a  constant  (i.e.,  a  function  of  no  arguments).  So: 

3jc  (Siudent(x))  becomes  Student(f$). 

The  functions  that  are  introduced  in  this  way  are  called  Skolem  functions  and 
Skolem  constants. 

Skolemization  plays  a  key  role  in  theorem-proving  systems  (particularly  resolution- 
based  ones)  because  the  Skolemization  of  a  sentence  in  is  unsatisfiable  iff  w  is.  But 
note  that  we  have  not  said  that  a  Skolemization  is  necessarily  equivalent  to  the  original 
sentence  from  which  it  was  derived.  Consider  the  simple  sentence,  3.v  (P(.v)).  It  can  be 
Skolemized  as  P(f  j).  But  now  observe  that: 

•  3.v  (/’(.v))— ♦  3.v  (P{x))  is  valid  (i.e.,  it  is  a  tautology).  It  is  true  in  all  inter¬ 

pretations. 

•  3x  (P(x))  -*  P(J i)  is  satisfiable  since  it  is  True  if  P(/,)  is  True.  But  it  is 

not  valid,  since  it  is  False  if  P  is  true  for  some  value  of 
x  that  is  different  from  / (  but  False  for  /  j. 

So  3.r  (P{x))  and  P(J ,)  are  not  logically  equivalent. 

The  prool  of  the  Clause  Form  Theorem,  which  we  state  next,  exploits  Skolcmiza- 
tion,  in  combination  with  standard  logical  identities,  as  the  basis  of  an  algorithm  that 
converts  any  first-order  sentence  w  into  another  sentence  w'  that  is  in  clause  form  and 
that  is  unsatisfiable  iff  w>  is. 
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THEOREM  B.4  Clause  Form  Theorem  _ 

Theorem:  Given  w,  a  sentence  in  first-order  logic,  there  exists  a  clause  form  repre¬ 
sentation  u>'  such  that  w'  is  unsatisfiable  iff  w  is. 

Proof:  The  proof  is  by  construction. The  following  algorithm  converttodauseform 
computes  a  clause  form  representation  of  ur. 

conventoclauseforni(w:  first-order  sentence)  = 

1.  Eliminate  — *  and  «-*■ ,  using  the  fact  that  P  —  Q  is  equivalent  to  ->P  V  Q- 

2.  Reduce  the  scope  of  each  i  to  a  single  term,  using  the  facts: 

•  Double  negation:  ~>(~>P)  ~  P. 

•  dc Morgan’s  laws: 

•  ->(/*  A  0)  -  (-./»  V  -<?). 

•  -iP  V  Q)  =  hP  A  -1 Q). 

•  Quantifier  exchange: 

•  -*Vx  (P(x))  *  3.x  (-*P(x)). 

•  -*3jc  (P(x))  =  Vx  (-</*( x)). 

3.  Standardize  apart  the  variables  so  that  each  quantifier  binds  a  unique 
variable.  For  example,  given  the  sentence: 

V.r  (P(x))  V  V.r 

the  variables  can  be  standardized  apart  to  produce: 

Vx  (/>(*))  V  Vv  (Q(y)). 

4.  Move  all  quantifiers  to  the  left  without  changing  their  relative  order.  At 
this  point,  the  sentence  is  in  prenex  normal  form. 

5.  Eliminate  existential  quantifiers  via  Skolemization,  as  described  above. 

6.  Drop  the  prefix  since  all  remaining  quantifiers  are  universal. 

7.  Convert  the  matrix  to  a  conjunction  of  clauses  by  using  the  fact  that 
both  V  and  A  are  associative  and  the  fact  that  V  and  A  distribute  over 
each  other. 

8.  Standardize  apart  the  variables  so  that  no  variable  occurs  in  more  than 
one  clause. 


EXAMPLE  B.9  Converting  the  Marcus  Fact  to  Clause  Form 

We  now  return  to  F.the  statement  about  Marcus’s  friends  that  we  introduced  in 
Example  B.6: 

Vx  ((Roman (x)  A  Knorv(x,  Marcus))  —* 

(Hate(x,  Caesar)  V  Vy  (3c  ( Hate(y .  z))  -*  Thinkcrazy(x ,  y)))). 

We  convert  F  to  clause  form  as  follows: 
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•  Step  1:  Eliminate  -* .  This  step  produces: 

Vx  (-« (Roman(x)  A  Know(x ,  Marcus))  V 

(Hate( x,  Caesar)  V  Vy  (p3z  ( Hate(y ,  z))  V  Thinker azy(x,  y)))). 

•  Step  2:  Reduce  the  scope  of  This  step  produces: 

Vx  (-iRoman(x)  V  ->Know(x,  Marcus)  V 

(Hate(x,  Caesar)  V  Vy  (Vz  (~>Hate(y,  z))  V  Thinkcrazy(x ,  y)»). 

(Notice  that  the  existential  quantifier  disappeared) 

•  Steps  3  and  4:  Standardize  apart  and  shift  the  quantifiers  to  the  left.  These 
steps  produce: 

Vx  Vy  Vz  (->/?omnn(x)  V  -i/Cnow(x,  Marcus)  V 

(Hate(x,  Caesar)  V  ->Hate(y,  z)  V  Thinkcrazy(x,  y))). 

•  Steps  5-8:  These  last  steps  produce: 

-i Roman(x)  V  ->Know(x,  Marcus)  V  Hate(x,  Caesar)  V  ~'Hate(y,  z) 

V  Thinkcrazy(x,  y). 


EXAMPLE  B.10  Handling  Existential  Quantifiers  and  Standardizing  Apart 

We  convert  the  following  sentence  to  clause  form: 

Vx  (Person(x)  — *  (3y  ( Mother-of(y ,  x))  A  3y  ( Father-of(y ,  x)))). 

•  Step  1:  Eliminate  — This  step  produces: 

Vx  (-iPerson(x)  V  (3 y  ( Mother-of(y ,  x))  A  3y  (Father-of(y,  x)))). 

•  Step  2:  Reduce  the  scope  of  This  step  is  not  necessary. 

•  Step  3:  Standardize  apart  the  variables  so  that  each  quantifier  binds  a  unique 
variable. 

Vx  (-iPerson(x)  V  (3 yx  ( Mother-of  (ylt  x))  A  3y2  ( Father-of(y2 ,  x)))) 

•  Step  4:  Move  all  quantifiers  to  the  left  without  changing  their  relative  order. 

Vx  3y,  3y2  (^Person(x)  V  (Mother-of  (yu  x)  A  Father-of(y2 ,  x))) 

•  Step  5:  Eliminate  existential  quantifiers  via  Skolemization. 

Vx  (->Person(x)  V  (Mother-of  (f^x),  x)  A  Father-of(f2(x ),  x))) 

•  Step  6.  Drop  the  prefix  since  all  remaining  quantifiers  are  universal. 

-'Person(x)  V  (Mother-of  (f{(x),  x)  A  Father-of(f2(x\x)) 

•  Step  7:  Convert  the  matrix  to  a  conjunction  of  clauses. 

(--Person(x)  V  Mother-of  (f{(x),x))  A 
(->Person(x)  V  Father-of(f2(x),  x)) 
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EXAMPLE  B.10  (Continued) 

•  Step  8:  Standardize  apart  the  variables  so  that  no  variable  occurs  in  more  than 
one  clause. 

(-1 Person(x\ )  V  Muther-of(f  X\))  A 
(~<Person(x 2)V  Father-of(f2(x2 ).  .r2)) 

Now  the  two  clauses  can  be  treated  as  independent  clauses,  regardless  of  the 
fact  that  they  were  derived  from  the  same  original  sentence. 


The  design  of  a  theorem  prover  can  be  simplified  if  all  of  the  inputs  to  the  theorem 
prover  have  been  converted  to  clause  form. 


EXAMPLE  B.11  Using  the  Marcus  Fact  in  a  Proof 

We  now  return  again  to  F ,  the  statement  about  Marcus's  friends  that  we  intro¬ 
duced  in  Example  B.6: 

Vjt  (( Roman(x )  A  Know(x ,  Marcus))  — * 

( Hate(x ,  Caesar)  v  Vy  (Bz  ( Hute{y ,  z))  — » Thinkcrazy(x.  y)))). 

When  we  convert  this  statement  to  clause  form,  we  get,  as  we  showed  in  the 
last  example,  the  formula  that  we  will  call  Fc\ 

(Fc)  -'Roman(x)  V  ->Know( x,  Marcus)  V  H at e(x.  Caesar) 

V  -> Hate(y ,  z)  V  Thinkcrazy(x,  y). 

In  its  original  form,  F  is  not  obviously  a  way  to  prove  that  someone  isn't  a 
Roman.  But,  in  clause  form  it  is  easy  to  use  for  that  purpose.  Suppose  we  add  the 
following  facts. 

•  Know(Paulus,  Marcus) 

•  -'Hate(Paulus,  Caesar) 

•  Hate( Augustus,  Flavius) 

•  ->Thinkcrazy(Paulus,  Augustus) 

We  can  now  prove  that  Paulus  is  not  a  Roman.  Paulus  knows  Marcus,  doesn’t 
hate  Caesar,  and  doesn’t  think  that  Augustus,  who  hates  Flavius  is  crazy.  The  gen¬ 
eral  statement  about  Marcus's  friends  must  hold  for  all  values  of  x.  In  the  case  of 
Paulus,  we’ve  ruled  out  four  of  the  five  literals  that  could  make  it  true.  The  one 
that  remains  is  ->Roman(Paulus).  Note  that  to  implement  the  reasoning  that  we 
just  did,  we  need  a  way  to  match  literals  like  Ktww(Paulus,  Marcus)  and 
- >Know{x ,  Marcus).  We’ll  present  unification,  a  technique  for  doing  that,  in  the 
next  section. 
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.2.2  First-Order  Logic  Resolution 

First -order  logic  is  undecidable.  We  staled  that  result  as  Theorem  22.4:  The  language 
FOL,|Kl.Km  =  {<A.  w>  :  A  is  a  decidable  set  of  axioms  in  first-order  logic,  w  is  a 
sentence  in  first-order  logic,  and  w  is  entailed  by  A}  is  not  in  D.  As  a  proof,  we 
sketched  Turing's  proof.  So  there  is  no  algorithm  to  decide  whether  or  not  a  state¬ 
ment  is  a  theorem.  But.  as  we  showed  in  Theorem  22.3,  the  language  FOL,heorem 
semideeidable  by  an  algorithm  that  constructs  a  lexicographic  enumeration  of  the 
valid  proofs  given  A.  Given  a  statement  w,  that  algorithm  will  discover  a  proof  if  one 
exists.  To  make  theorem-proving  useful  in  practical  problem  domains,  however,  we 
need  techniques  that  are  substantially  more  efficient  at,  least  in  many  cases,  than  the 
exhaustive  enumeration  method.  Fortunately,  such  techniques  exist.  And  Finding 
even  better  ones  remains  an  active  area  of  research.  Keep  in  mind,  however,  that 
every  first-order  logic  theorem  prover  has  the  limitation  that,  if  asked  to  prove  a 
nontheorem,  it  may  not  be  able  to  tell  that  no  proof  exists. 

In  this  section  we  describe  one  important  proof  technique  — the  extension  to  first- 
order  logic  of  the  resolution  algorithm  that  we  presented  for  Boolean  logic  in  B.1.2. 
First-order  resolution  was  introduced  in  l Robinson  1965]  and  has  served,  since  then,  as 
the  basis  for  several  generations  of  automatic  theorem-proving  programs.  It  is  sound 
(i.e.,  it  can  prove  only  theorems  that  are  entailed  by  the  axioms  it  is  given).  And  it  is 
refutation-complete,  by  which  we  mean  the  following:  Given  a  set  of  axioms  A  and  a 
sentence  ST.  if  ST  is  a  theorem  then  A  A  ~>ST  will  derive  a  contradiction  and  the  reso¬ 
lution  algorithm,  assuming  it  uses  a  systematic  strategy  for  exploring  the  space  of  pos¬ 
sible  resolution  steps,  will  (eventually)  find  it.  We  note,  however,  that  first-order 
resolution  is  not  complete  in  the  sense  that  there  may  be  theorems  that  will  not  be  gen¬ 
erated  by  any  resolution  step. 


First-order  logic  resolution  is  the  basis  for  logic  programming  languages  such 
as  Prolog.  (M.2.3)  It  has  played  a  key  role  in  the  evolution  of  the  field  of  ar¬ 
tificial  intelligence.  (M.2)  It  has  been  used  to  solve  problems  in  domains 
ranging  Irom  program  verification.  (H.1.1)  to  medical  reasoning.  One  note¬ 
worthy  application  in  mathematics  was  the  proof  of  the  Robbins  Algebra 
Conjecture,  which  had  outwitted  mathematicians  for  60  years  a. 


A  first-order  logic  resolution  theorem  prover  Q  works  in  essentially  the  same  way  a 
Boolean  one  does.  It  begins  with  A,  a  set  of  axioms  that  have  been  converted  to  clause 
form,  lo  prove  a  statement  ST,  it  negates  ST,  converts  the  result  to  clause  form,  and 
adds  it  to  A.  TTien,  at  each  resolution  step,  it  chooses  two  parent  clauses  that  contain 
complementary  literals,  resolves  the  two  clauses  together,  creates  a  resolvent,  and  adds 
it  to  A ■  If  the  unsatisfiable  clause  nil  is  ever  generated,  a  contradiction  has  been  found. 

Unification 

The  only  new  issue  that  we  must  face  is  how  to  handle  variables  and  functions.  In  par¬ 
ticular,  what  does  it  now  mean  to  say  that  two  literals  are  complementary?  As  before, 
two  literals  are  complementary  iff  they  are  inconsistent. Two  literals  are  inconsistent 


828  Appendix  B  The  Theory:  Working  with  Logical  Formulas 


iff  one  of  them  is  positive,  one  of  them  is  negative  (i.e.,  begins  with  -’J.thcy  both  con¬ 
tain  the  same  predicate,  and  they  arc  about  intersecting  sets  of  individuals.  In  other 
words,  two  literals  are  inconsistent,  and  thus  complementary,  iff  they  make  conflicting 
claims  about  at  least  one  individual.  To  check  for  this,  resolution  exploits  a  matching 
process  called  unification.  Unification  lakes  as  its  arguments  two  literals,  each  with 
any  leading  -»  removed.  It  will  return  Fail  if  the  two  literals  do  not  match,  either  be¬ 
cause  their  predicates  are  different  or  because  it  is  not  certain  that  the  intersection  of 
the  sets  of  individuals  that  they  are  about  is  not  empty.  It  will  succeed  if  they  do 
match.  And,  in  that  case.it  will  return  a  list  of  substitutions  that  describes  how  one  lit¬ 
eral  was  transformed  into  the  other  so  that  they  match  and  the  nonempty  intersection 
was  found. 

When  are  two  literals  about  intersecting  sets  of  individuals?  Recall  that  all  clause 
variables  are  universally  quantified.  So  the  domains  of  any  two  variables  overlap.  For 
example.  P(x)  and  ->Pfy)  are  complementary  literals.  One  says  P  is  true  of  everyone; 
the  other  says  that  P  is  false  of  everyone.  The  domain  of  any  one  variable  necessarily 
includes  all  specific  values.  So  P(.v)  and  ->P(  Marcus)  are  complementary  since  P  cannot 
be  true  of  everyone  but  not  true  of  Marcus.  P( Caesar)  and  ->P(  Marcus)  are  not  com¬ 
plementary  since  P  can  be  true  of  Caesar  but  not  of  Marcus.  P(f(Marcus))  and 
->P(f( Marcus))  are  complementary,  but  P(J\Marcus))  and  ~>P(fl Caesar))  are  not.  While 
it  is  possible  that  l\  Marcus)  and  j\Caesar)  refer  to  the  same  individual,  it  is  not  certain 
that  they  do.  Unification  will  handle  functions  by  recursively  invoking  itself.  It  will 
check  that  function  symbols  match  in  the  same  way  that  it  checks  that  predicate  sym¬ 
bols  match. 

If  the  same  variable  occurs  more  than  once  in  a  literal,  any  substitutions  that  are 
made  to  it  must  be  made  consistently  to  all  of  its  occurrences.  So  the  unification  algo¬ 
rithm  must,  each  time  it  makes  a  substitution,  apply  it  to  the  remainder  of  both  literals 
before  it  can  continue.  For  example,  consider  unifying  Kiunv{x,x)  (everyone  knows 
him/her/itself)  and  Kno\v(  Marcus,  Marcus).  Unification  will  match  the  first  x  with  the 
first  Marcus,  substituting  Marcus  for  .v.  It  will  then  substitute  Marcus  for  the  second  oc¬ 
currence  of  x  before  it  continues.  It  will  succeed  when  it  next  matches  Marcus  with 
Marcus.  But  now  consider  unifying  Ktuw(x.x)  and  Kiun\{  Marcus,  G/<*jrw).The  second 
literal,  Know(Marcus,  Caesar),  is  not  about  knowing  oneself.  Unification  will  fail  in  this 
case  because  it  will  substitute  Marcus  for  .v.  apply  that  substitution  to  the  second  x,  and 
then  fail  to  match  Marcus  and  Caesar. 

Each  invocation  of  the  unification  procedure  will  return  either  the  special  value  Fail 
or  a  list  of  substitutions.  We  will  write  each  list  as  {subst  j,  subst  2, . , . ).  We  will  write 
each  substitution  as  subylsub 3,  meaning  that  sub\  is  to  be  written  in  place  of  suh2.  If 
unification  succeeds  without  performing  any  substitutions,  the  substitution  list  will  be 
nil  (the  empty  list).  We  are  now  ready  to  state  the  unification  procedure: 

unify-for-resoluiiun(lil\,  lit<  variables,  constants,  function  expressions  or  positive 

literals)  = 

1.  If  either  ///,  or  lit 2  is  a  variable  or  a  constant  then: 
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1.1.  Case  (checking  the  conditions  in  order  and  executing  only  the  first  one 
that  matches): 

///,  and//72  are  identical:  Return  nil.  I*  Succeed  with  no  substitution 

required. 

lit ,  is  a  variable  that  occurs  in  lit 2.  Return  Fail.  I*  These  two  cases 

implement  the 
occur  check.  See 
note  below. 

lit2  is  a  variable  that  occurs  in  litx:  Return  Fail.  •• 

lit |  is  a  variable:  Return  {lit2llit (). 

lit2  is  a  variable:  Return  (lit]llit2). 

otherwise:  Return  Fail.  I* Two  different  con¬ 

stants  do  not  match. 

2.  If  the  initial  predicate  or  function  symbols  of  lit  j  and  lit2  are  not  the  same,  re¬ 
turn  Fail. 

3.  If  lit  1  and  lit2  do  not  have  the  same  number  of  arguments,  return  Fail. 

4.  substitution-list  =  nil. 

5.  For  i  =  l  to  the  number  of  arguments  of  lit\  do: 

5.1.  Let  S  be  the  result  of  invoking  unify-for-resolution  on  the  f,h  argument  of 
lit  1  and  of  lit2. 

5.2.  If  S  contains  Fail,  return  Fail. 

5.3.  If  Sis  not  equal  to  nil  then: 

5.3.1.  Apply  S  to  the  remainder  of  both  lity  and  lit2. 

5.3.2.  Append  S  to  substitution-list. 

6.  Return  substitution-list. 

In  step  1.1.  unify -for  resolution  performs  a  check  called  the  occur  check.  Consid¬ 
er  attempting  to  unify  f(x)  with  f{g(x)).  Without  the  occur  check,  the  function  ex¬ 
pression  g(x)  could  be  unified  with  x,  producing  the  substitution  g(x)/x.  But  now 
there  is  no  way  to  apply  that  substitution  consistently  to  the  new  occurrence  of  x.  In 
this  case,  the  problem  might  simply  not  be  noticed,  with  the  consequence  that  any 
theorem  provcr  that  uses  the  result  of  the  unification  may  produce  an  unsound  in¬ 
ference.  TTve  problem  is  even  clearer  in  the  following  case:  Consider  attempting  to 
unify  /(x,x)  with  f{g(x). g(.v)). Without  the  occur  check, g(x)  could  be  unified  with*, 
again  producing  the  substitution  g(x)lx.  But  now,  the  unification  algorithm  must 
apply  that  substitution  to  the  remainder  of  the  two  clauses  before  it  can  continue.  So 
(,x)  and  (,g(x))  become  (,g(x))  and  (,g(g(x))).  But  now  it  has  to  substitute  again, 
and  so  lorth.  Unfortunately,  the  occur  check  is  expensive.  So  some  theorem-proving 
programs  omit  it  and  take  a  chance, 
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EXAMPLE  B.12  Unification  Examples 

We  show  the  result  of  unify-for-resohaion  in  each  of  the  following  cases: 


Inputs 

Result 

Substitution 

[11 

Roman(x), 

Roman(Paitlus). 

Succeed 

Patilttslx. 

(21 

Roman(x), 

Ancient(Paalus). 

Fail 

13) 

Roman  (Jaiher-ofl  Marcus)), 
Roman(x). 

Succeed 

father-of{Marcus)lx. 

[4] 

Roman(futhtr-of(M arcus)), 
Roman(Flavius). 

Fail 

[5] 

Roman(x), 

Roman(y). 

Succeed 

xly. 

[61 

Romanlfathrr-oflx)), 

Roman(x). 

Fail  (fails  occur 
check) 

ra 

Likes{x,y), 

Likes(Flavius.  Marcus). 

Succeed 

Flaviuslx,  Marcusly. 

Notice  that  unify-for-resolution  is  conservative.  It  returns  a  match  only  if  it  is  certain 
that  its  two  arguments  describe  intersecting  sets  of  individuals.  For  example,  father- 
of{Marcus)  and  Flavius  may  (but  do  not  necessarily)  refer  to  the  same  individual 
Without  additional  information,  we  do  not  want  resolution  to  assert  a  contradiction 
between  Roman(father-of{M arcus))  and  -'Romani Flavius). 

One  other  property  of  unify-for-resolulion  is  worth  noting:  Consider  unifying  Romanise) 
with  Romanfy). The  algorithm  as  given  here  returns  the  substitution  xly.  We  could,  equiva¬ 
lently.  have  defined  it  so  that  it  would  return  ylx.  That  choice  was  arbitrary.  But  we  could 
also  have  defined  it  so  that  it  returned  the  substitution  Marcus/. x.  M arcus! y.  That  substitu¬ 
tion  effectively  converts  a  statement  that  had  applied  to  all  individuals  into  one  that  applies 
only  to  Marcus.This  restricted  statement  is  entailed  by  the  more  general  one,  so  a  theorem 
prover  that  exploited  such  a  match  would  still  be  sound.  But  proving  statements  would  be¬ 
come  more  difficult  because  resolution  is  going  to  look  for  contradictions.  General  state¬ 
ments  lead  to  more  contradictions  than  specific  ones  do.  So  we  can  maximize  the 
performance  of  a  resolution  theorem  prover  if  we  exploit  a  unification  algorithm  that  re¬ 
turns  what  we  will  call  a  most  general  unifier,  namely  a  substitution  with  the  property  that 
no  other  substitution  that  preserves  soundness  imposes  fewer  restrictions  on  the  values  of 
the  variables.  The  algorithm  that  we  have  presented  always  returns  a  most  general  unifier 

We  can  now  define  complementary  literals  analogously  to  the  way  they  were  de¬ 
fined  for  Boolean  logic. Two  literals  are  complementary  literals  iff  they  unify  and  one 
of  them  is  the  negation  of  the  other.  So,  for  example  ->Roman(.\)  and  Roman(Paulus) 
are  complementary  literals.  Just  as  in  the  case  of  Boolean  logic,  the  conjunction  of  a 
pair  of  complementary  literals  is  unsatisfiable. 
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Resolution:  The  Algorithm 

Now  that  we  have  a  way  to  identify  complementary  literals,  we  can  define  the  resolu¬ 
tion  algorithm  for  first-order  logic.  It  works  the  same  way  that  resolution  works  in 
Boolean  logic  except  that  two  new  things  need  to  happen  after  each  resolution  step: 

•  The  substitution  that  was  produced  when  the  two  complementary  literals  were  uni¬ 
fied  must  be  applied  to  the  resolvent  clause.  To  see  why  this  is  important,  consider 
resolving  P(x)  V  Q(x)  with  -<Q( Marcus).  The  first  clause  says  that,  for  all  values  of 
x,  at  least  one  of  P  or  Q  must  be  true.  The  second  one  says  that,  in  the  specific  case  of 
Marcus ,  Q  is  not  true.  From  those  two  clauses,  it  follows  that,  in  the  specific  case  of 
Marcus,  P  must  be  true.  It  does  not  follow  that  P  must  be  true  of  all  values  of  x.The 
result  of  unifying  Q(x)  with  ->Q(Marcus)  is  the  substitution  Marcuslx .  So  we  can 
construct  the  result  of  resolving  these  two  clauses  by  first  building  the  clause  that  is 
the  disjunction  of  all  literals  except  the  two  complementary  ones.  That  gives  us  P(x). 
We  then  apply  the  substitution  Marcuslx  to  that,  which  produces  P(Marcus). 

•  We  must  guarantee  that  the  variable  names  in  the  resolvent  are  distinct  from  all  the 
variable  names  that  already  occur  in  any  of  the  clauses  in  L.  If  this  is  not  done,  it  is 
possible  that  later  resolution  steps  will  treat  two  different  variables  that  just  hap¬ 
pen  to  have  the  same  name  as  though  they  were  a  single  variable  to  which  consis¬ 
tent  substitutions  must  be  applied.  For  a  concrete  example  of  the  problem  that  this 
can  cause,  see  Exercise  B.8. 

resolve- FOL(A:  set  of  axioms  in  clause  form,  ST:  a  statement  to  be  proven)  = 

1.  Construct  L,  the  list  of  clauses  from  A. 

2.  Rename  all  variables  in  ST  so  that  they  do  not  conflict  with  any  variables  in  L. 

3.  Negate  ST,  convert  the  result  to  clause  form,  and  add  the  resulting  clauses  to 

L. 

4.  Until  either  the  empty  clause  (called  nil )  is  generated  or  no  progress  is  being 
made  do: 

4.1.  Choose  from  L  two  clauses  that  contain  a  pair  CL  of  complementary  lit¬ 
erals.  Call  them  the  parent  clauses. 

4.2.  Resolve  the  parent  clauses  together  to  produce  a  new  clause  called  the 
resolvent: 

4.2.1.  Initially,  let  the  resolvent  be  the  disjunction  of  all  of  the  literals  in 
both  parent  clauses  except  for  the  two  literals  in  CL. 

4.2.2.  Apply  to  all  of  the  literals  in  the  resolvent  the  substitution  that 
was  constructed  when  the  literals  in  CL  were  unified. 

4.23.  Rename  all  of  the  variables  in  the  resolvent  so  that  they  do  not 
conflict  with  any  of  the  variables  in  L. 

4.3.  If  the  resolvent  is  not  nil  and  is  not  already  in  L,  add  it  to  L. 

5.  If  nil  was  generated,  a  contradiction  has  been  found.  Return  success.  ST  must 
be  true. 

6.  If  nil  was  not  generated  and  there  was  nothing  left  to  do,  return  failure.  ST 
may  or  may  not  be  true.  But  no  proof  of  ST  has  been  found. 
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EXAMPLE  B.13  FOL  Resolution 

Assume  that  we  are  given  the  following  axioms  (in  clause  form): 

(Fc)  -i iRoman(x)V->Know(x ,  Marcus)vHate(x,Caesar)V->H aie(y, z) V 7 hinkcrazy(x, y). 
-iHate(Cornelius,  Caesar ). 

Hate(Augustus,  Flavius). 

-T  hinkcrazy{Cornelius ,  Augustus). 

Roman(Cornelius). 

We  will  use  resolution  to  prove  3x  ( ->Know(x ,  Marcus)). 

We  negate  3x  ( -*Know(x ,  Marcus)),  producing  — >(3jc  (->Know{x,  Marcus)))  or  Vx 
(Know(x,  Marcus)).  Converting  this  to  clause  form  (and  standardizing  apart  the  vari¬ 
ables),  we  get: 

Know{x\,  Marcus). 


Resolution  can  now  proceed  as  follows.  (But  note  that  this  is  just  one  path  it  could  pur¬ 
sue.  It  could  choose  parent  clauses  in  a  different  order.)  We  show,  at  each  resolution  step, 
the  substitution  that  the  unification  process  produced.  Note  also  that  the  variable  names 
have  been  standardized  apart  at  each  step. 


Know{x  |.  Marcus ) 


(Fc) 

x\lx  (or  the  other  way  around) 


-.Hate{Cornelius,  Caesar)  -.Roman{x2)  V  Hate{x2.  Caesar)  V  -.Hate{y2,  z2)  V  Thinkcrazy(x2, y2) 
'  — - - - Cornelius/x2 

■.Romani Cornelius)  V  -i Hate( y3,  z3)  V  Thinkcrazyi Cornelius ,  y3)  Roman(Comelius ) 


-.Thinkcrazy  {Cornelius,  Augustus) 

->Hate{Augusius,  z5) 


Hotel y zJ  V  Thinkcrazy(Comelius,  y*) 

Augustus  ly 4 

Hate{Augustus .  Flavius) 

Flavius/Zi 


Resolve-FOL  must  typically  search  a  space  of  possible  resolution  paths.  As  we  saw  in  the 
case  of  resolve- Boolean,  the  efficiency  of  the  search  process  can  be  affected  by  the  order  in 
which  parent  clauses  (and  complementary  literals  within  them)  are  chosen.  In  particular 
both  the  unit  preference  strategy  and  the  set-of-support  strategy  may  be  useful.  The  effi¬ 
ciency  of  the  process  can  also  be  improved  by  failing  to  insert  into  L  those  resolvents  that 
put  the  process  no  closer  to  finding  a  contradiction  (for  example  because  they  are  sub¬ 
sumed  by  clauses  that  are  already  present).  Other  optimizations  are  also  possible  a.  Even 
with  them,  however,  the  size  of  the  search  space  that  must  be  explored  may  grow  too  fast  to 
make  resolution  a  practical  solution  for  many  kinds  of  problems.  One  way  to  cut  down  the 
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space  is  to  limit  the  form  of  the  clauses  that  are  allowed.  For  example,  logic  programming 
languages,  such  as  Prolog,  work  only  with  Hom  clauses,  which  may  have  at  most  one  posi¬ 
tive  literal.  See  M.2.3  for  a  brief  introduction  to  Prolog  and  some  of  its  applications 

Resolution:  The  Inference  Rule 

Recall  that,  in  our  discussion  of  resolution  in  Boolean  logic,  we  pointed  out  that  reso¬ 
lution  is  both  an  inference  rule  and  an  algorithm  for  checking  unsatisfiability. The  same 
is  true  for  resolution  in  first-order  logic.  Using  the  definitions  of  unification  and  substi¬ 
tution  that  we  have  just  provided,  we  can  state  resolution  as  an  inference  rule.  Let 
Q,  -iQ,  P\,  P2 . P„  and  Pi,  P2, . . . ,  R„  be  literals,  let  substitution-list  be  the  substitu¬ 

tion  that  is  returned  by  unify -for-resolution  when  it  unifies  Q  and  ~^Q,  and  let 
substitute(clause,  substitution-list)  be  the  result  of  applying  substitution-list  to  clause. 
Then  define: 

•  Resolution :  From  the  premises:  (Pi  V  P2  V . . .  V  P„  V  Q)  and 

(P,  V  R2  V...VP,nV-Q), 

Conclude:  substitute^ i  V  P2  V  . . .  V  Pn  V  Pj  V  P2  V  ...  V  P,„), 

substitution-list). 


Exercises 

1.  Convert  each  of  the  following  Boolean  formulas  to  conjunctive  normal  form. 

a.  (a  A  b)  — *  c 

b.  ->(n  —*(b  A  c)) 
c(flVb)-»(cAd) 

d.  V  (-v  As))) 

2.  For  each  of  the  following  Boolean  formulas  w,  use  3-conjunctive  Boolean  to  con¬ 
struct  a  formula  w‘  that  is  satisfiable  iff  w  is. 

a.  (n  V  b)  A  (a  A  ->b  A  ->c  A  d  A  e) 

b.  -<(a  —*{b  A  c)) 


3.  Convert  each  of  the  following  Boolean  formulas  to  disjunctive  normal  form. 

a.  (a  V  b)  A  (c  V  d) 

b.  (uVfc)-*(c  A d) 


4.  Use  a  truth  table  to  show  that  Boolean  resolution  is  sound. 

5.  Use  resolution  to  show  that  the  following  premises  are  inconsistent: 

a  V  -b  V  c.b  V  -d,  -v:  V  d,  b  V  c  V  d^a  V  -^.and-d  V  -5. 

6.  Prove  that  the  conclusion  b  Ac  follows  from  the  premises:  a—*(cV  d), 

b ->  a,  d->c,  and  b.  v 

a*  Convert  the  premises  and  the  negation  of  the  conclusion  to  conjunctive  normal 
b.  Use  resolution  to  prove  the  conclusion. 


7-  'TT  x* *>>  =  <*i  V  *2)  A  „  that 

example  in  B.1.3.  Shaw  how  /,  can  be  converted  to  an  OBDD  usir 
ordering  (x3  <  x,  <  x2). 


we  used  as  an 
using  the  variable 
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8.  In  this  problem,  we  consider  the  importance  of  standardizing  .apart  the  variables 
that  occur  in  a  first-order  sentence  in  clause  form.  Assume  that  we  are  given  a  sin¬ 
gle  axiom.  V.v  ( l.ikes[x.  Ice  cream)).  And  we  want  to  prove  d.v  (Likes(Mikey.x)). 
Use  resolution  to  do  this  but  do  not  standardize  apart  the  two  occurrences  of  x. 
What  happens? 

9.  Begin  with  the  following  fact  from  Example  BA 

j  I  j  V.v  (( Roman(x )  A  Know(x,  Marcus))  —* 

( Haie(x ,  Caesar)  V  Vy  (3z(H ale{\\ z))  — *  T liinkcrazy(x.  v)))). 

Add  the  following  facts. 

[2j  V.v  (( Romani  x )  A  Gladiatnr(x))  — *  Knmc(x.  Marcus)) 

|3)  Roman(Cluudius) 

[4]  -i3.t  ( Thinkcrazy(Clau(lius,  x )) 

|5J  ->3.v  (Hule(Claudius,  .v)) 

[6]  Hate(  Isaac.  Caesar) 

[7]  V.v  ((Rnman(x)  A  Famous(x))  — *  (Poliiicran(x)  V  Gladiatnr(x))) 

[8]  Famoiis(  Isaac ) 

[9 1  Romani  Isaac) 

(1()|  ->Knotc(  Isaac.  Marcus) 

a.  Convert  each  of  these  facts  to  clause  form. 

b.  Use  resolution  and  this  knowledge  base  to  prove  -< Gladiator  (Claudius). 

c.  Use  resolution  and  this  knowledge  base  to  prove  Poliiician(Isaac). 

10.  In  M.2.3.  we  describe  a  restricted  form  of  first-order  resolution  called  SLD  reso- 
lution.This  problem  explores  an  issue  that  arises  in  that  discussion.  In  particular, 
we  wish  to  show  that  SLD  resolution  is  not  refutation-complete  for  knowledge 
bases  that  are  not  in  Horn  clause  form.  Consider  the  following  knowledge  base  B 
(that  is  not  in  Horn  clause  form). 

(iiav,)v<?(.v,) 

|2|  - P(x, )  V  <?(.v2) 

131  /><*,)  v  -Gto) 

(41  -/>(.v4)  V  -C><  r4) 

a.  Use  resolution  to  show  that  II  is  inconsistent  (i.e.„show  that  the  empty  clause 
nil  can  be  derived). 

b.  Show  that  SLD  resolution  cannot  derive  nil  from  ft. 
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In  this  appendix,  we  will  do,  in  gory  detail,  one  proof  of  the  correctness  of  a  construc¬ 
tion  algorithm. 

Theorem  5.3  asserts  that,  given  a  nondeterministic  FSM  M  that  accepts  some  lan¬ 
guage  L,  there  exists  an  equivalent  deterministic  FSM  that  accepts  L.  We  proved  this 
theorem  by  construction.  We  described  the  following  algorithm: 

ndf  smtodfsm(M:  NDFSM)  = 

1.  For  each  state  q  in  K  do: 

Compute  eps(q).  /*  These  values  will  be  used  below. 

2.  s'  =  eps(s) 

3.  Compute  S': 

3.1.  active-states  =  {s'}.  /*  We  will  build  a  list  of  all  states  that  are  reach¬ 

able  from  the  start  state.  Each  element  of 
active-states  is  a  set  of  states  drawn  from  K. 

3.2.  5'  =  0. 

3.3.  While  there  exists  some  element  Q  of  active-states  for  which  S'  has  not  yet 
been  computed  do: 

For  each  character  c  in  S  do: 
new-state  =  0. 

For  each  state  q  in  Q  do: 

For  each  state p  such  that  ( q ,  c,p)ei  do: 
new-state  =  new-state  U  eps(p). 

Add  the  transition  (Q,c,  new-state)  to  S'. 

If  new-state  g  active-states  then  insert  it  into  active-states. 

4.  K'  =  active-states. 

5.  A'  =  {Q e  K' :  Q  C\  A  #  0}. 
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From  any  NDFSM  M.  ndfsmtodfsm  constructs  a  DFSM  M'.  which  we  claimed  is 
both  (1 )  deterministic  and  (2)  equivalent  to  Af.  Wc  prove  those  claims  here. 

Proving  1  is  trivial.  By  the  definition  in  step  3  of  S' ,  we  are  guaranteed  that  S'  is  defined 
for  all  reachable  elements  of  K '  and  all  possible  input  characters.  Further. step  3  inserts 
a  single  value  into  S'  for  each  state,  input  pair,  so  M'  is  deterministic. 

Next  we  must  prove  2.  In  other  words,  we  must  prove  that  AT  accepts  a  string  w  iff 
M  accepts  ir.  We  constructed  the  transition  function  S'  of  A/'  so  that  A/'  mimics  an  “all 
paths"  simulation  of  M.  We  must  now  prove  that  that  simulation  returns  the  same  result 
that  M  would.  In  particular.  S'  defines  each  individual  step  of  the  behavior  of  M'.  We 
must  show  that  a  sequence  of  steps  of  A/'  mimics  the  corresponding  sequence  of  steps 
of  M  and  then  that  the  results  of  the  two  sequences  arc  identical. 

So  we  begin  by  proving  the  following  lemma,  which  asserts  that  entire  sequences  of 
moves  of  M '  behave  as  they  should: 

•  Lemma:  Let  w  be  any  string  in  51*.  let  p  and  q  be  any  states  in  K,  and  let  P  be  any 

state  in  A"'. Then: 

(</,  wOl-w*  (P' E) (*7W(<7),  *  (P.  e)  and  peP. 

In  other  words,  if  the  original  NDFSM  M  starts  in  slate  q  and,  after  reading  the 
siring  w.  can  land  in  state  p  (along  at  least  one  of  its  paths),  then  the  new  machine 
A/'  must  behave  as  follows:  When  started  in  the  state  that  corresponds  to  the  set  of 
states  the  original  machine  M  could  get  to  from  q  without  consuming  any  input,  M‘ 
reads  the  string  ir  and  lands  in  one  of  its  new  “set"  states  that  contains  p.  Further¬ 
more,  because  of  the  only-if  part  of  the  lemma.  M'  must  end  up  in  a  "set”  state  that 
contains  only  states  that  M  could  gel  to  from  r/  after  reading  ir  and  following  any  avail¬ 
able  e-transitions. 

To  prove  the  lemma  wc  must  show  that  S'  has  been  defined  so  that  the  individual 
steps  of  AT.  when  taken  together,  do  the  right  thing  for  an  input  siring  w  of  any  length. 
Since  we  know  what  happens  one  step  at  a  lime,  we  will  prove  the  lemma  by  induction 
on  |w|. 

We  must  first  prove  that  the  lemma  is  true  for  the  base  case,  where  |w|  =  0  (i.e., 
ir  =  e).To  do  this,  wc  actually  have  to  do  two  proofs,  one  to  establish  it  for  the  if  part 
of  the  lemma,  and  the  other  to  establish  it  for  the  only  if  part: 

Base  step,  if  part :  Prove  that  (r/.  tr)|-Af*  (p.  e)  if  (eps{q).  ir)|-V/  *  (P,  e)  and  psP. 
Or.  turning  it  around  to  make  it  a  little  clearer: 

[(eps(q).  u’)|-.w*  (P,e)  and  peP]  -*[(</.  tr)|-w*(/>,  *)). 

If  |ir|  =  0,  then,  since  M'  contains  no  «-transiiions.  M'  makes  no  moves.  So  it  must 
end  in  the  same  state  it  started  in.  namely  eps(q).  So  P  =  eps(q).  If  P  contains  p,  then 
p  e  eps(q).  But.  given  our  definition  of  the  function  eps,  that  means  exactly  that,  in  the 
original  machine  M,p  is  reachable  from  q  just  by  following  ^-transitions,  which  is  exactly 
what  we  need  to  show. 

Base  step,  only  if  part:  We  need  to  show: 

|ty.  ( p .  e))  -*  [(<7>.v(r/).  «’)!-«•*  (P  k)  and  peP], 
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If  \w\  =  0  and  the  original  machine  M  goes  from  q  top  with  only  w  as  input,  it  must  go 
from  q  to  p  following  just  e-transitions.  In  other  words  p  e  eps(q).  Now  consider  the 
new  machine  M\  It  starts  in  eps(q),  the  set  state  that  includes  all  the  states  that  are 
reachable  from  q  via  e  transitions.  Since  M'  contains  no  e-transitions,  it  will  make  no 
moves  at  all  if  its  input  is  e.  So  it  will  halt  in  exactly  the  same  state  it  started  in,  namely 
eps(q).  So  P  =  eps(q)  and  thus  contains  p.  So  M'  has  halted  in  a  set  state  that  includes 
p,  which  is  exactly  what  we  needed  to  show. 

Next  we‘U  prove  that,  if  the  lemma  is  true  for  all  strings  iv  of  length  k,  where  k  ^  0, 
then  it  is  true  for  all  strings  of  length  k  •+•  1.  Any  string  of  length  k  +  1  must  contain  at 
least  one  character.  So  we  can  rewrite  any  such  string  as  zx ,  where  x  is  a  single  character 
and  z  is  a  string  of  length  k.  The  way  that  M  and  M'  process  z  will  thus  be  covered  by 
the  induction  hypothesis.  We  use  the  definition  of  6',  which  specifies  how  each  individual 
step  of  M'  operates,  to  show  that,  assuming  that  the  machines  behave  identically  for  the 
first  k  characters,  they  behave  identically  for  the  last  character  also  and  thus  for  the  entire 
string  of  length  k  +  1.  Recall  the  definition  of  8': 


*’( 0.0  =  UW»(/’):  3«eQ  ((</,<:.  p)  6  A)}. 


To  prove  the  lemma,  we  must  show  a  relationship  between  the  behavior  of: 


•  the  compulation  of  the  NDFSM  M:  (<y,  w)\-M*  ( p ,  e),  and 

•  the  computation  of  the  DFSM  M'\  ( eps(q ),  ( P ,  e)  and  p  e  P. 

Rewriting  w  as  zx,  we  have: 


•  the  computation  of  the  NDFSM  M:  (q,  j.y)|-m*  (p,  e),  and 

•  the  compulation  of  the  DFSM  M':  ( eps{q ),  zx)\-M  *  (P,  e)  and  p  e  P. 

Breaking  each  of  these  computations  into  two  pieces,  the  processing  of  z  followed 
by  the  processing  of  the  single  remaining  character  x.  we  have: 

•  the  computation  of  the  NDFSM  M :  (q,  zx)\-M*  (sh  x)\-M  (p.  e).  and 

•  the  compulation  of  the  DFSM  M'\  (eps(q),  zx)\-M-*  (Q.  x)|-M-  ( P ,  e)  and  p  e  P. 

In  other  words,  after  processing  z,M  will  be  in  some  set  of  states  S,  whose  elements 
we'll  write  as  sh  M'  will  be  in  some  “set”  state  that  we  will  call  Q.  Again,  we’ll  split  the 
proof  into  two  parts: 

Induction  step,  if  part :  We  must  prove: 


|(<7;v(</)’  *  V)I-Af  *  ((?,  x)\-M.  (P,  e)  and  p  e  P]  — » [(q,  [sh  jc)|-m  (p,  e)]. 

If,  after  reading  z,  M'  is  in  slate  Q,  we  know,  from  the  induction  hypothesis,  that  the 
original  machine  M,  after  reading  z,  must  be  in  some  set  of  states  S  and  that  Q  is  pre¬ 
cisely  that  set.  ow  we  just  have  to  describe  what  happens  at  the  last  step  when  the  two 
machines  read  x  If  we  have  that  M\  starting  in  Q  and  reading  x  lands  in  P,  then  we 
know,  rom  t  e  e  inition  ol  8  above,  that  P  contains  precisely  the  states  that  M  could 
,and  m  a  ter  starting  in  any  state  in  S  and  reading*. Thus  if  p  e  P,p  must  be  a  state  that 
M  could  land  in  if  started  in  s,  on  reading  x. 
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Induction  step ,  only  if  part:  We  must  prove: 

[(<?,  (*r  *)|-w(p,  e)]  -» [(eps(q),  2.v)|-,vr*  ((?.  *)l -M  (P,  e)  and  p  e  P\. 

By  the  induction  hypothesis,  we  know  that  if  M.  after  processing  z,can  reach  some  set 
of  states  S.  then  Q  (the  state  M'  is  in  after  processing  z)  must  contain  precisely  all  the 
states  in  S.  Knowing  that,  and  our  definition  of  S',  we  know  that  from  Q,  reading*,  M' 
must  be  in  some  set  state  P  that  contains  precisely  the  states  that  M  can  reach  starting 
in  any  of  the  states  in  S,  reading  .t.  and  then  following  all  e  transitions.  So.  after  consum¬ 
ing  iv  (i.e.,z.v),  M\  when  started  in  eps(q).  must  end  up  in  a  state  Pthat  contains  all  and 
only  the  states  p  that  M.  when  started  in  <7.  could  end  up  in. 

Now  that  we  have  proved  the  lemma,  we  can  complete  the  proof  that  M'  is  equiva¬ 
lent  to  M.  Consider  any  string  w  e  2*. 

If  iveL(M)  (i.e.,  the  original  machine  M  accepts  w)  then  the  following  two  state¬ 
ments  must  be  true. 

1.  The  original  machine  M,  when  started  in  its  start  state,  can  consume  w  and  end 
up  in  an  accepting  state.  This  must  be  true  given  the  definition  of  what  it  means 
for  a  machine  to  accept  a  string. 

2.  Ceps(s).  {Q.  e)  for  some  Q  containing  some  a  e  A.  In  other  words,  the 

new  machine,  when  started  in  its  start  stale,  can  consume  w  and  end  up  in  one  of 
its  accepting  stales.  This  follows  from  the  lemma,  which  is  more  general  and  de¬ 
scribes  a  computation  from  any  state  to  any  other.  But  if  we  use  the  lemma  and 
let  q  equal  s  (i.e.,  M  begins  in  its  start  state)  and  p  =  a  for  some  aeA  (i.e.,  M 
ends  in  an  accepting  state),  then  we  have  that  the  new  machine  M',  when  started 
in  its  start  state,  eps(s),  will  consume  w  and  end  in  a  stale  that  contains  a.  But  if 
M'  does  that,  then  it  has  ended  up  in  one  of  its  accepting  states  (by  the  definition 
of  A'  in  step  5  of  the  algorithm).  So  M'  accepts  w  (by  the  definition  of  what  it 
means  for  a  machine  to  accept  a  siring). 

If  w  &L(M)  (i.e.,  the  original  machine  M  does  not  accept  w)  then  the  following  two 
statements  must  be  true: 

1.  The  original  machine  M.  when  started  in  its  start  state,  will  not  be  able  to  end  up 
in  an  accepting  state  after  reading  w.  This  must  be  true  given  the  definition  of 
what  it  means  for  a  machine  to  accept  a  siring. 

2.  If  ( eps(s ),  M’)|-,vr*  «?.  e).  then  Q  contains  no  slate  a  e  A.  In  other  words,  the  new 
machine,  when  started  in  its  start  state,  cannot  consume  w  and  end  up  in  one  of 
its  accepting  states. 'Phis  follows  directly  from  the  lemma. 

Thus  M ’  accepts  precisely  the  same  set  of  strings  that  M  does. 


APPENDIX  d 


The  Theory:  Context-Free 
Languages  and  PDAs 


In  this  appendix,  we  will  provide  the  proofs  of  three  claims  that  we  made  introduced  in 
Part  Ill.  during  our  discussion  of  the  context-free  languages. 


p  i  Proof  of  the  Greibach  Normal  Form  Theorem 

In  this  sect  ion,  we  prove  the  result  that  we  stated  as  Theorem  11.2,  namely  that,  given  a 

context-free  grammar  C,  there  exists  a  Greibach  normal  form  grammar  G'  such  that 

L(G’)  =  L(G)  —  (e>- 

Recall  that  a  grammar  G  =  ( V ,  2,  /?,  S )  is  in  Greibach  normal  form  iff  every  rule  in 

R  has  the  form: 

X  — *■  afi,  where  aeS  and  (3  e  (V  -  2)*. 

So  the  following  kinds  of  rules  violate  the  Greibach  normal  form  constraints: 

•  Epsilon  productions,  i.e.,  productions  of  the  form  A  —  e.  Given  a  grammar  G, 
e-rules  can  be  removed  by  the  procedure  remove  Eps  that  we  defined  in  Section 
1 1.7.4.Thc  resulting  grammar  G'  will  have  the  property  that  L(G')  =  L(G)  -  {e}. 

•  Unit  productions,  i.e..  productions  of  the  form  A—*B.  where  B  is  a  single  element 
of  V  —  2:  Given  a  grammar  G,  unit  productions  can  be  removed  by  the  procedure 
remove  Units  that  we  defined  in  Section  11.8.3.  The  resulting  grammar  G'  will  have 
the  property  that  /.(G')  =  L(G). 

•  Productions,  such  as  A"  — »  AaB,  whose  right-hand  sides  have  terminal  symbols  in 
positions  other  than  the  left-most:  Given  a  grammar  G,  these  productions  can  be 


R?Q 
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removed  by  the  procedure  remove  Mixed  that  we  defined  in  Section  11.8.3.  The 
resulting  grammar  G'  will  have  the  property  that  L(G')  =  /-(G).  Note  that 
removeMixed  actually  goes  farther  than  we  need  to.  since  it  removes  all  terminals 
except  those  that  stand  alone  on  a  right-hand  side.  So  it  will  rewrite  the  rule  X  — *  a AB, 
even  though  it  is  in  Greibach  normal  form. 

•  Productions,  such  as  X  — *  A B.  whose  right  hand  side  begins  with  a  nonterminal 
symbol:  We  must  define  a  new  procedure  to  handle  these  productions. 

The  process  of  converting  a  grammar  to  Chomsky  normal  form  removes  all  rules  in 
the  first  three  of  these  classes.  So  the  algorithm  that  we  are  about  to  present  for  con¬ 
verting  a  grammar  G  to  Greibach  normal  form  will  begin  by  converting  G  to  Chomsky 
normal  fonn.  using  the  algorithm  that  we  presented  in  Section  1 1.8.3.  Note,  however, 
that  Greibach  normal  form  allows  rules,  such  as  X  — » *  a/t  and  X  — *  a/tflCD,  that  are 
not  allowed  in  Chomsky  normal  form.  So  there  exist  more  efficient  Greibach  normal 
form  conversion  algorithms  than  the  one  we  are  about  to  describe  Q. 

Our  algorithm  will  also  exploit  the  following  operations  that  we  have  described 
elsewhere: 

•  Rule  substitution  allows  nonterminals  to  he  replaced,  in  right-hand  sides,  by  the 
strings  that  they  can  derive.  Suppose  that  G  =  (V.  i.  R,  .V)  contains  a  rule  r  of  the 
form  X-+aY{3'  where  a  and  fi  are  elements  of  V*  and  Y  b(V  -  2).  Let 
Y  — 'Tily’l ...  |y„  be  all  of  G's  Y  rules.  And  let  G'  be  the  result  of  removing  from  R 
the  rule  rand  replacing  it  by  the  rules  X  -*ny{f3.  X  — *  ay-,(3, ....  A'  — *  ay,,/?.  Then 
Theorem  1 1.3  tells  us  that  L(G’)  =  L(G). 

•  The  procedure  remo velef {recursion,  which  we  defined  in  Section  15.2.2  as  part  of 
our  discussion  of  top-down  parsing,  removes  direct  left-recursion  and  replaces  it 
by  right-recursion.  So.  for  example,  if  the  A  rules  of  G  are  {A  —  Ab.  A  — *■  c}, 
removeleftreciirsion  will  replace  those  rules  with  the  rules  { A  — •  cA\  A — *  c,  A'  — * 
b A'.  A’-*  b).  Note  that  the  right-hand  side  of  every  rule  that  is  introduced  by 
removeleftreeursion  begins  with  either  a  terminal  symbol  or  an  element  of  (V  -  2). 
None  of  these  right-hand  sides  begins  with  an  introduced  nonterminal  (such  as  A'). 

example  D.l  Using  Substitution  to  Convert  a  Very  Simple  Grammar 

To  see  how  these  procedures  are  used,  consider  the  following  grammar,  which  is 
in  Chomsky  normal  form  but  not  in  Greibach  normal  form. 

S  — *  AB 
A  —  XY\c 
X  — *■  a 
y  —  b 
B~*  c 
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To  convert  this  grammar  to  Greibach  normal  form,  we: 

•  use  rule  substitution  to  replace  the  rule  S  —  AB  with  the  rules  S  —  XYB  and 
S  —  cB  (since  A  can  derive  XY  and  c).  The  second  of  these  new  rules  is  in 
Greibach  normal  form. 

•  use  rule  substitution  on  the  first  of  the  new  rules  and  replace  it  with  the  rule 
S  —  B.YB  (since  X  can  derive  a).  This  new  rule  is  in  Greibach  normal  form. 

•  use  rule  substitution  to  replace  the  rule  A  — ►  XY  with  the  rule  A  —  aV  (since 
X  can  derive  a).  This  new  rule  is  in  Greibach  normal  form. 

Since  the  remaining  three  rules  are  already  in  Greibach  normal  form,  the 
process  ends  with  the  grammar  containing  the  rules  {5  —  a YB,  S  —  cB,  A  — ► 
aV,*—  a,y—  b,B  —  c}. 


EXAMPLE  D.2  Dealing  with  Left  Recursion 


But  now  consider  the  following  grammar. 

5  —  SA  |  BA 
A —  a 
B  —  b 

The  first  rule  is  left-recursive.  If  we  apply  rule  substitution  to  it,  we  get  two 
new  rules,  S  —  SSA  and  S  —  BAA.  But  now  we  still  have  a  rule  whose  left- 
hand  side  begins  with  S.  We  can  apply  rule  substitution  again,  but  no  matter 
how  many  times  we  apply  it,  we  will  get  a  new  rule  whose  left-hand  side  begins 
with  5.  To  solve  this  problem,  we  must  exploit  removeleftrecursion  to  eliminate 
direct  left-recursion  before  we  apply  rule  substitution.  Doing  that,  we  get  the 
following. 


5  —  BAS'  |  BA 
A  —  a 
B—  b 

S'  — AS1  \  A 


Now,  to  convert  this  grammar  to  Greibach  normal  form,  we: 


•  use  rule  substitution  to  replace  the  rule  5 
This  new  rule  is  in  Greibach  normal  form. 

•  use  rule  substitution  to  replace  the  rule  5 
new  rule  is  in  Greibach  normal  form. 


BAS'  with  the  rule  S  —  bAS'. 
BA  with  the  rule  S  —  bA.This 
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EXAMPLE  D.2  ( Continued) 

•  use  rule  substitution  to  replace  the  rule  S'  —*■  AS'  with  the  rule  S'  — *  aS'.This 
new  rule  is  in  Greibach  normal  form. 

•  use  rule  substitution  to  replace  the  rule  S'  —  A  with  the  rule  S'  -*  a.This  new 
rule  is  in  Greibach  normal  form. 

The  remaining  two  rules  are  already  in  Greibach  normal  form,  so  the  process 
terminates. 


More  realistic  grammars  typically  contain  more  than  a  few  nonterminals  and  those 
nonterminals  may  derive  each  other  in  arbitrary  ways.  To  handle  such  grammars,  we 
need  a  systematic  way  to  organize  the  substitutions  that  will  be  performed. 

So  the  conversion  algorithm  we  will  exploit  is  the  following.  It  will  return  a  new 
grammar  it  calls  G(i. 

con vertinGreibach ( G :  CFG  in  Chomsky  normal  form)  = 

1.  Choose  un  ordering  of  the  nonterminals  in  G.  Any  ordering  will  work  as  long 
as  the  start  symbol  comes  first.  Let  Gc,  initially  be  G. 

2.  Rewrite  the  rules  of  G(j  so  that  each  rule  whose  left-hand  sides  is  one  of  G’s 
original  nonterminals  is  in  one  of  the  following  two  forms: 

•  X  —  ufl.  where  «e  S  and  j3e(V  -  2)*  (in  other  words,  the  rule  is  in 
Greibach  normal  form), or 

•  X  —*Y  (3,  where  Y  e  V  —  2  and  Y  occurs  after  X  in  the  ordering  defined 
in  step  1. 

Call  the  constraint  we  have  just  described  the  rule-order  constraint.  Note 
that,  if  any  of  G's  rules  are  directly  left -recursive,  this  step  will  add  some  new 
rules  whose  left-hand  sides  are  new  nonterminals.  We  will  not  require  that 
these  new  rules  satisfy  the  rule-order  constraint,  since  the  new  nonterminals 
are  unnumbered.  But  note  that  no  newly  introduced  nonterminal  will  occur 
as  the  first  symbol  in  any  rule’s  right-hand  side. 

3.  Consider  each  of  G’s  original  nonterminals,  starting  with  the  highest  num¬ 
bered  one.  and  working  backwards.  For  each  such  nonterminal  N ,  perform 
substitutions  on  the  rules  in  G(i  so  that  the  right-hand  sides  of  all  N  rules  begin 
with  a  terminal  symbol. 

4.  Consider  each  nonterminal  N  that  was  introduced  by  reniovelefirecursion.  Perform 
substitutions  on  the  rules  of  Gf,  so  that  the  right  hand  sides  of  all  N  rules  start  with 
a  terminal  symbol. 

5.  Return  G(7. 

The  grammar  G(;  that  convent oGreihach  returns  will  be  in  Greibach  normal  form. 
And  L(G(])  =  L(G).  We’ll  now  describe  how  to  perform  steps  2. 3.  and  4.  Define  Akto 
be  the  klt>  nonterminal,  as  defined  in  step  1 , 
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Step  2:  We  will  first  rewrite  all  the  A  t  rules  so  that  they  meet  the  rule-order  con¬ 
straint.  Then  we  ll  do  the  same  for  the  A 2  rules,  and  so  forth.  For  each  k ,  as  we  begin  to 
transform  the  rules  for  Ak,  we  assume  that  all  rules  for  nonterminals  numbered  from  1 
to  k  -  1  already  satisfy  the  rule-order  constraint. 

Any  Ak  rule  whose  right-hand  side  starts  with  a  terminal  symbol  already  satisfies 
the  constraint  and  can  be  ignored.  But  we  must  consider  all  A*  rules  of  the  form: 

Ak  — *  Aft. 

Group  those  rules  into  the  following  three  cases  and  consider  them  in  this  order: 
i.  j  >  k:  No  action  is  required. 

«.  j  <  k:  Replace  the  rule  Ak  — *  Aft  by  the  set  of  rules  that  results  from  substitut¬ 
ing.  for  Aj,  the  right-hand  sides  of  all  the  Aj  rules.  Since  all  Aj  rules  have  already 
been  transformed  so  that  they  satisfy  the  rule-order  constraint,  the  right-hand 
sides  of  all  A,  rules  start  with  either  terminal  symbols  or  nonterminals  numbered 
greater  than  j.  They  may  still  be  numbered  less  than  k,  however.  If  any  of  them 
are.  repeal  the  substitution  process.  Since  the  indices  must  increase  by  at  least  1 
each  time,  it  will  be  necessary  to  do  this  no  more  than  k  -  1  times. 

iii.  j  =  k :  AH  such  rules  are  of  the  form:  Ak  — ►  A/,/3.  They  are  directly  left-recursive. 
Use  rent  o  vole  ft  recu  rsion  to  remove  the  left-recursion  from  all  Ak  rules.  Recall 
that  rcmovcleftrecursion  will  create  a  new  set  of  Ak  rules. The  right-hand  side  of 
each  such  rule  will  begin  with  a  string  that  corresponds  to  the  right-hand  side  of 
some  nonrecursivc  Ak  rule.  But.  as  a  result  of  handling  all  the  rules  in  case  ii, 
above,  all  of  those  right-hand  sides  must  start  with  either  a  terminal  symbol  or  a 
non-terminal  symbol  numbered  above  k.  So  all  Ak  rules  now  satisfy  the  rule- 
order  constraint. 


EXAMPLE  D.3  Performing  Step  2  of  the  Conversion  Algorithm 

We'll  begin  with  the  following  grammar  in  Chomsky  normal  form. 

S-*SB  |  AB  |  d 
A  — • *SA  |  a 
B—*SA 

We'll  order  the  three  nonterminals  S,  A,  B.  So  first  we  must  rewrite  the  three  S 
rules  so  that  they  satisfy  the  rule-order  constraint.  The  second  and  third  of  them 
already  do.  But  we  must  rewrite  the  first  one,  which  is  left-recursive.  Using 
renwvelc ft  recursion,  we  get  the  new  grammar. 

S-+AB\ABS’  |  d  |  dS' 

A  —*5A  |  a 
B-+SA 
S'  -*  B  |  BS' 
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EXAMPLE  D.3  ( Continued) 

Now  we  consider  the  A  rules. The  second  one  starts  with  a  terminal  symbol, but 
the  first  one  violates  the  rule-order  constraint  since  A  is  numbered  2  and  5  is 
numbered  1.  We  use  rule  substitution  and  replace  it  with  four  new  rules,  one  for 
each  S  rule. That  produces  the  following  set  of  A  rules. 

A  —*■  ABA  |  ABS'A  |  d  A  \  d S’ A  \  a 

But  now  the  first  two  of  these  are  left-recursive.  So  we  use  removeleftrecursion  and 
get  the  following  set  of  A  and  A '  rules.The  A  rules  now  satisfy  the  rule-order  constraint. 

A  —  dA  |  d  AA'  |  d  S' A  I  dS'/M'  |  a  |  avt* 

A'  — >  BA  |  BAA'  |  BS'A  \  BS'  AA’ 

Finally,  we  consider  the  single  B  rule.  B  —* ►  SA.  It  fails  to  satisfy  the  rule-order  con¬ 
straint  since  B  is  numbered  3  and  S  is  numbered  1  .We  use  rule  substitution  and  replace 
it  with  four  new  rules,  one  for  each  S  rule.  That  produces  the  following  set  of  B  rules. 

B—^  ABA  |  ABS'A  |  d  A  |  d  S' A 

The  first  two  of  these  fail  to  satisfy  the  rule-order  constraint  since  B  is  num¬ 
bered  3  and  A  is  numbered  2.  So  we  use  rule  substitution  again. The  first  B  rule  is 
replaced  by  the  rules: 

d  ABA  |  d  A  A’  BA  \  dS'ABA  \  dS'  AA'  BA  |  afl/t  |  a  A' BA. 

And  the  second  B  rule  is  replaced  by  the  rules: 

/3—  d  ABS'A  |  d  AA'BS'A  \  dS' ABS'A  \  dS'AA'BS'A  |  a  BS'A  |  a  A' BS'A. 

At  this  point,  the  complete  grammar  is  the  following  (where  the  B  rules  are 
broken  up  just  for  clarity). 

S->AB\  ABS'  |  d  |  dS' 

A  —  dA  |  d  AA'  |  dS'A  |  dS' AA'  |  a  |  a  A' 

B-*  dA  |  d S' A 

B^>  dABA  |  d  AA'BA  \  d  S' ABA  \  d  S'AA'BA  \  a  BA  \  aA'BA 

B-*  d  ABS'A  |  d  AA'BS'A  \  d  S' ABS' A  |  dS'AA'BS'A  \  a  BS'A  \  a  A' BS'A 

S'  -*  B  |  BS' 

A'-*  BA  |  BAA'  |  BS'A  \  BS'A  A' 

This  grammar  satisfies  the  rule-order  constraint.  But  it  is  substantially  larger 
and  messier  than  the  original  grammar  was.  This  is  typical  of  what  happens  when 
a  grammar  is  converted  to  Greibach  normal  form. 
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At  the  end  of  step  2,  all  rules  whose  left-hand  sides  contain  any  of  G’s  original  non¬ 
terminals  satisfy  the  rule-order  constraint.  Note  also  that  step  2  preserves  the  three 
properties  initially  established  by  conversion  to  Chomsky  normal  form:  There  are  no 
e-rules,  there  are  no  unit  productions,  and,  in  every  right-hand  side,  all  symbols  after 
the  first  must  be  nonterminals. 

Step  3:  Let  n  be  the  number  of  original  nonterminals  in  G.Then  An  is  the  last  of 
them  (given  the  order  from  step  l).The  right-hand  sides  of  all  A„  rules  must  begin 
with  a  terminal  symbol.  This  must  be  true  since  there  are  no  original  nonterminals 
numbered  higher  than  n.  Now  consider  the  A„_1  rules.  Their  right-hand  sides  must 
begin  with  a  terminal  symbol  or  A„.  Use  substitution  to  replace  all  the  rules  whose 
right-hand  sides  start  with  A„.  After  doing  that,  the  right  hand  sides  of  all  the  A„_ | 
will  all  start  with  terminal  symbols.  Continue  working  backwards  until  the  A\  rules 
have  been  processed  in  this  way.  This  step  also  preserves  the  three  properties  initially 
established  by  conversion  to  Chomsky  normal  form.  So,  at  the  end  of  this  step,  all 
rules  whose  left-hand  sides  contain  any  of  G"s  original  nonterminals  are  in  Greibach 
normal  form. 

Step  4:  The  removeleftrectirsion  procedure  introduces  new  nonterminal  symbols 
and  new  rules  with  those  symbols  as  their  left-hand  sides.  So  there  will  be  new  rules 
like  S'  —  AS'\A.  The  new  nonterminals  are  independent  of  each  other,  so  the 
right-hand  sides  of  all  of  their  rules  consist  only  of  terminals  and  original  nonter¬ 
minals.  If  r  is  one  of  those  rules  and  r  is  not  already  in  Greibach  normal  form  then 
it  is  A/  —  Ajfi  for  some  original  nonterminal  Af.  As  a  result  of  step  3,  all  Aj  rules 
are  already  in  Greibach  normal  form.  So  a  single  substitution  for  Aj  will  replace  r 
by  a  set  of  N  rules  in  Greibach  normal  form.  This  step  preserves  all  of  the  proper¬ 
ties  that  were  true  at  the  end  of  step  3.  So.  at  the  end  of  this  step,  Gc  is  in  Greibach 
normal  form. 


EXAMPLE  D.4  Performing  Steps  3  and  4  of  the  Conversion 

We  II  continue  with  the  grammar  from  Example  D.3.  After  step  2,  it  was  as  follows. 
S  —  AB  |  ABS'  |  d  |  dS' 

A  —  dA  |  6AA'  |  dS'A  |  dS'AA'  |  a  |  aA' 

B~*  6A\  d  S' A 

B  MBA  |  d  AA'BA  l  d  S' ABA  |  d  S'AA'BA  |  a  BA  |  a  A'BA 

B->  d ABS' A  |  d AA'BS'A  |  dS'ABS'A  |  d S'AA'BS'A  |  a BS'A  |  a A'BS'A 
S'  —  0  I  BS ' 

A'~*BA  \  BAA' |  BS'A  |  BS' AA' 
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EXAMPLE  D.4  ( Continued) 

Step  3:  All  the  B  rules  must  be  in  Greibach  normal  form.  It  turns  out  that,  in  this 
example,  the  A  rules  are  also.  But  then  we  must  consider  the  S  rules. The  first  two  of 
them  have  right-hand  sides  that  do  not  begin  with  terminal  symbols.  So  they  must  be 
rewritten  using  substitution.  After  doing  that,  the  complete  set  of  S  rules  is  as  follows. 

S-*  AAB  |  d AA'B  |  AS' AB  |  AS’AA'B  |  a B  |  a A'B 

S  —  AABS'  |  AAA' BS'  I  dS'ABS'  |  AS' AA' BS'  |  a BS'  |  a A'BS' 

S  —  d  |  AS' 

Step  4:  We  must  use  substitution  on  both  of  the  S'  rules.  The  two  of  them  will 
be  replaced  by  the  following  set  of  S'  rules. 

5'—  AA  |  AS' A 

S'  —  A  ABA  I  AAA'  BA  |  AS' ABA  \  AS' A  A' BA  |  a  BA  I  a  A’BA 

S'  —  AABS' A  |  AAA’  BS'  A  |  AS'ABS'A  |  AS'AA’BS' A  |  a  BS'A  |  a  A'BS' A 

S'  -  dAS'  |  dS'AS' 

S'  —  AABAS'  |  AAA' BAS'  |  AS' ABAS'  |  AS'  AA'  BAS'  I  a  BAS'  |  a  A' BAS' 

S'  —  dABS'AS'  |  AAA' BS' AS'  |  dS’ABS'AS*  |  dS'AA'BS'AS'  |  aflS'AS'  | 
a  A'BS' AS' 

And  similarly  for  the  A'  rules.  We’ll  skip  writing  them  all  out.  There  are 
14  (the  number  of  B  rules)  •  4  (the  number  of  A'  rules)  =  56  of  them. 

So  the  original,  6-rule  grammar  in  Chomsky  normal  form  becomes  a  118-rule 
grammar  in  Greibach  normal  form. 


THEOREM  D.1  Greibach  Normal  Form  Grammar 

Theorem:  Given  a  context-free  grammar  G,  there  exists  a  Greibach  normal  form 
grammar  G(;  such  that  L(Ga)  =  L(G)  -  {e}. 

Proof:  The  proof  is  by  construction,  using  the  algorithm  am verttoC ireihach 
described  above. 

D.2  Proof  that  the  Deterministic  Context-Free  Languages 
are  Closed  Under  Complement 

In  this  section,  we  prove  the  result  that  we  slated  as  Theorem  13.10. 

THEOREM  D.2  Closure  Under  Complement 

Theorem:  The  deterministic  context-free  languages  are  closed  under  complement 

Proof:  The  proof  is  by  construction.  The  construction  exploits  techniques  that  we 
used  to  prove  several  other  properties  of  the  context-free  languages,  but  now 
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must  be  careful  to  preserve  the  property  that  the  PDA  we  are  working  with  is 
deterministic. 

If  /.  is  a  deterministic  context-free  language  over  the  alphabet  2,  then  L$  is 
accepted  by  some  deterministic  PDA  M  =  (AC,  2  U  { $ },  T.  A.  s.  A).  We  need  to 
describe  an  algorithm  that  constructs  a  new  deterministic  PDA  that  accepts 
The  algorithm  will  proceed  in  two  main  steps: 

1.  Convert  M  to  an  equivalent  PDA  Mm>  that  is  in  a  constrained  form  that  we 
will  call  deterministic  normal  form. 

2.  From  Mm\  build  Af#  to  accept  (- >L)% . 

The  design  of  deterministic  normal  form  is  motivated  by  the  observation  that 
a  deterministic  PDA  may  fail  to  accept  an  input  string  w  for  any  one  of  several 
reasons. 

1.  Its  computation  ends  before  it  finishes  reading  w. 

2.  Its  computation  ends  in  an  accepting  state  but  the  stack  is  not  empty. 

3.  Its  computation  loops  forever,  following  e-lransitions,  without  ever  halting 
in  either  an  accepting  or  a  nonaccepling  state. 

4.  Its  computation  ends  in  a  nonaccepting  state. 

If  we  attempt  to  build  A/#  by  simply  swapping  the  accepting  and  nonaccepting 
states  or  M .  we  will  build  a  machine  that  correctly  fails  to  accept  every  string  that 
M  would  have  accepted  (i.e.,  every  string  in  L%).  But  it  cannot  be  guaranteed  to 
accept  every  string  in  (-./.)$. To  do  that,  we  must  also  address  issues  1  —  3  above. 
Converting  M  to  deterministic  normal  form  will  solve  those  problems  since  any 
deterministic  PDA  in  restricted  normal  form  will,  on  any  input  w%: 

•  read  all  of  w, 

•  empty  its  stack,  and 

•  hall. 


One  additional  problem  is  that  we  don’t  want  to  accept  -*L(M). That  includes 
strings  that  do  not  end  in  $.  We  must  accept  only  strings  that  do  end  in  $  and  that 
are  in 

Given  a  deterministic  PDA  M,  we  convert  it  into  deterministic  normal 
form  in  a  sequence  of  steps,  being  careful,  at  each  step,  not  to  introduce 
nondeterminism. 

In  the  first  step,  we  will  create  AT,  which  will  contain  two  complete  copies  of 
\i  s  slates  and  transitions.  M'  will  operate  in  the  first  copy  until  it  reads  the  end- 
of-input  symbol  $.  After  that,  it  will  operate  in  the  second  copy.  Call  the  states  in 
the  fir>t  copy  the  preS  states.  Call  the  states  in  the  second  copy  the  post$  states.  If 
q  is  a  pre$  slate,  call  the  corresponding  post$  state  q' .  If  q  is  an  accepting  state, 
then  add  </  to  the  set  of  accepting  stales  and  remove  q  from  the  set.  If  M  contains 
the  transition  ((</.  $,«),(/»,  (3))  and  q  is  a  pre$  state,  remove  that  transition  and 
replace  u  with  the  transition  (( q .  $,  a),  (p\  /3)).  Now  view  AT  as  a  directed  graph 
but  ignore  the  actual  labels  on  the  transitions.  If  there  are  states  that  are  unreach¬ 
able  Irom  the  start  state,  delete  them.  If  M  was  deterministic,  then  M'  also  is  and 
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If  M’  ever  follows  a  transition  from  a  post$  state  that  reads  any  input  then  it 
must  not  accept.  So  we  can  remove  all  such  transitions  without  changing  the  lan¬ 
guage  that  is  accepted.  Remove  them.  Now  all  transitions  out  of  post$  states  read 
no  input.  So  they  are  one  of  the  following: 

•  stack-e-transitions:  ((p.  e,  e).  (q,  y))  (nothing  is  popped), or 

•  stack-productive  transitions:  ((p,  e.  a),  ( q .  y)).  where  a  e  I'+  (something  is 
popped). 

Next  we  remove  all  stack-e-transitions  from  post$  states.  To  construct  an  algo¬ 
rithm  to  do  this,  observe: 

•  since  M'  is  deterministic,  if  it  contains  the  stack-e-lransilion  ((p.  e.e),  (9.7)) 
then  it  contains  no  other  transitions  from  p. 

•  if  ((/>,  e,  e),  (q,  y))  ever  plays  a  role  in  causing  M'  to  accept  a  string  then  there 
must  be  a  path  from  q  that  eventually  reaches  an  accepting  state  and  clears 
the  stack. 

So  we  can  eliminate  the  stack-E-transition  ((/>.  e.  e).  (q.  y))  as  follows:  First,  if 
q  is  accepting,  make  p  accepting.  Next,  delete  ((p.  e.  e).  (</,  y))  and  replace  it  by 
transitions  that  go  directly  from  p  to  wherever  q  could  go,  skipping  the  move  to  q. 

So  consider  every  transition  (( q.e.a).(r .  ft)).  If  a  =  e.  then  add  the  transition 
((p.  e.  e).  (r,  /3y )).  Otherwise,  if  y  =  e  then  add  the  transition  ((p,  e,  a),  (r,  /3)). 
Otherwise,  suppose  that  y  is  y,y2...y*.  If  a  =  yiyi-.y*  for  some  k  ^  n,  then 
add  the  transition  ((p.  e. e),  (r,  /3y*  + , ... y„)).  In  other  words,  don’t  bother  to 
push  the  part  that  the  second  transition  would  have  popped  off.  If  a  =  yrj  for 
some  7j  *  e.  then  add  the  transition  ((p.  e.  17).  (r,  j3)).  In  other  words,  skip  push¬ 
ing  y  and  then  popping  it.  Just  pop  the  rest  of  what  the  second  transition  would 
pop.  If  any  new  stack-E-transitions  from  p  have  been  created,  then  replace  them 
as  just  described  except  that,  if  the  process  creates  a  transition  of  the  form 
((p.  b,  e),  (p.  y’)).  where  y'  is  not  shorter  than  y  from  the  first  transition  that  was 
removed,  then  the  new  transition  is  not  describing  a  path  that  can  ever  lead  to 
M'  clearing  its  stack  and  accepting.  So  simply  delete  it.  Continue  until  all 
stack-E-transitions  have  been  removed.  With  a  bound  on  the  length  of  the  string 
that  gets  pushed  when  a  new  transition  is  created,  this  process  must  eventually 
hall.  Since  there  was  no  nondeterminism  out  of  1/.  there  won't  be  nondetermin¬ 
ism  out  of  p  when  p  simply  copies  the  transitions  from  q. 

At  this  point,  M'  has  the  following  properties. 

•  Every  transition  out  of  a  post$  state  pops  at  least  once  character  off  the  stack. 

•  No  transition  out  of  a  post$  slate  reads  any  input. 

•  All  accepting  slates  arc  posl$  states. 

Next,  we  consider  problem  2  as  described  above  (M  doesn't  accept  because  its 
slack  isn’t  empty).  That  problem  would  go  away  if  our  definition  of  acceptance 
were  by  accepting  state  alone,  rather  than  by  accepting  state  and  empty  stack. 
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Recall  that,  in  Example  12.14,  we  presented  an  algorithm  that  constructs,  from 
any  PDA  that  accepts  by  accepting  state  and  empty  stack,  an  equivalent  one 
that  accepts  by  accepting  state  alone.  The  resulting  PDA  has  a  new  start  state  s' 
that  pushes  a  new  symbol  #  onto  the  stack.  It  also  has  a  single  accepting  state,  a 
new  state  q„,  which  is  reachable  only  when  the  original  machine  would  have 
reached  an  accepting  stale  and  had  an  empty  stack.  Our  next  step  will  be  to 
apply  that  algorithm  to  Af  to  produce  Af".  Once  we've  done  that,  we  can  later 
make  qa  nonaccepting  and  thus  reject  every  string  in  L%.  At  the  same  time,  we 
are  assured  that  doing  so  will  not  cause  M"  to  reject  any  string  that  was  not  in 
L$,  since  no  such  string  can  drive  M"  to  q„.  The  only  issue  we  must  confront  is 
that  the  algorithm  of  Example  12.14  may  convert  a  deterministic  PDA  into  a 
nondeterministic  one  because  transitions  into  qa  may  compete  with  other  tran¬ 
sitions  that  were  already  present  (as  one  docs  in  the  example  we  considered 
when  we  presented  the  algorithm).  But  that  cannot  happen  in  the  machine  Af" 
that  results  when  the  algorithm  is  applied  to  Af .  Each  new  transition  into  the 
new  stale  q„  has  the  form  ((«,  e,  #),  (</„,  e)),  where  a  is  a  posl$  state.  No  tran¬ 
sition  in  Af  pops  #  since  #  is  not  in  its  stack  alphabet.  And  there  are  no 
stack-e-transitions  from  a  post$  state  in  Af  (because  all  such  transitions  have 
already  been  eliminated).  So  we  can  guarantee  that  Af "  is  equivalent  to  Af  and 
is  still  deterministic.  We  also  know  that,  whenever  Af "  is  in  any  state  except  the 
new  start  state  s'  and  the  new  accepting  state  </„.  there  is  exactly  one  #  on  the 
slack  and  it  is  on  the  bottom. 

Note  that  we  have  not  switched  PDA  definitions.  We  will  still  accept  by  accept¬ 
ing  stale  and  empty  stack.  So  it  will  be  necessary  later  to  make  sure  that  the  final 
machine  lhat  wc  build  can  empty  its  stack  on  any  input  it  needs  to  accept. 

Next  we  consider  problem  1  (Af  halts  without  reading  all  its  input).  We  must 
complete  Af ",  by  adding  a  dead  state,  in  order  to  guarantee  that,  from  any  config¬ 
uration  in  which  there  may  be  unread  input  characters  (i.e.,  any  configuration 
with  a  pre$  slate),  Af "  has  a  move  that  it  can  make.  The  problem  is  that  it  is  not 
sulticicnt  simply  to  assure  lhat  there  is  a  move  for  every  input  character.  Consid¬ 
er  tor  example  a  PDA  Af#,  where  X  =  (a.  b},  F  =  {#,  1. 2 },  and  the  transitions 
from  slate  q  are  ((</,  a,  1).  ( p .  2))  and  ((<y,  b.  1),  (r,  2)).  If  Af#  is  in  state  q  and  the 
character  on  the  top  of  the  stack  is  2,  Af  #  cannot  move. 

We  can't  solve  this  problem  just  by  requiring  that  there  be  some  element  of  A 
Tor  each  (input  character,  stack  character)  pair  because  we  allow  arbitrarily  long 
strings  to  be  popped  from  the  stack  on  a  single  move.  For  example,  again  let 
~  =  {a. b}  and  1  =  {#,  1, 2}.  Suppose  that  the  transitions  from  stale  q  are: 

((r/.  a.  12).  (/>,  2)). 

((r/,a,  21).(p,2)>, 

((ty.b,  122),  (r.  2)),  and 

((//,b,  211),  (r,  2)). 

If  the  top  ol  the  stack  is  22  and  the  next  input  character  is  a  or  b,  Af#  cannot  move. 
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So  our  next  step  is  to  convert  M"  into  a  new  machine  M'"  with  the  follow¬ 
ing  property:  Every  transition,  except  the  one  From  the  start  state  s'.  pops 
exactly  one  symbol.  Note  that  this  is  possible  because,  in  every  state  except  s' 
and  the  one  accepting  slate  q„.  #  is  on  the  bottom  of  the  stack.  And  there  are 
no  transitions  out  of  qa.  So  there  always  exists  at  least  one  symbol  that  can  be 
popped.  To  build  Mm  we  use  a  slight  variant  of  the  technique  we  used  in  the 
algorithm  convertPDAiorestriced  that  we  described  in  Section  12.3.  We  replace 
any  transition  that  popped  nothing  with  a  set  of  transitions,  one  for  each  ele¬ 
ment  of  F".  These  transitions  pop  a  symbol  and  then  push  it  back  on.  And  we 
replace  any  transition  that  popped  more  than  one  symbol  with  a  sequence  of 
transitions  that  pops  them  one  at  a  lime. To  guarantee  that  no  nondeterminism 
is  introduced  when  we  do  this,  it  is  necessary  to  be  careful  when  creating  new 
states  as  described  in  step  6.  If,  from  some  state  q.  there  is  more  than  one  tran¬ 
sition  that  pops  the  same  initial  sequence  of  characters,  all  of  them  must  stay 
on  the  same  path  until  they  actually  pop  something  different  or  read  a  differ¬ 
ent  input  character. 

Next  we  add  two  new  dead  states,  d  and  d'.  The  new  dead  state  d  will  contain 
strings  that  do  not  end  in  S.The  new  dead  slate  d'  will  contain  strings  that  do  end 
in  $.  For  every  character  c  e  2,  add  the  transition  ((</,  c,  e).  (d.  e)).  So,  if  Mm  ever 
goes  to  d,  it  can  loop  in  d  and  finish  reading  its  input  up  until  il  hits  $.  Then  add 
the  transition  {(d.  $.  e).  (d‘.  f.)).  So  Mm  moves  from  d  to  d'  when  it  encounters  $. 
Finally,  we  must  make  sure  that,  from  d',  Mm  can  clear  its  stack.  So,  for  every 
symbol  y  in  T.  add  the  transition  ((</'.  e.  y).  (</',  e)).  After  adding  those  transi¬ 
tions.  every  symbol  except  #  can  be  removed.  Note  that  none  of  these  new  transitions 
compete  with  each  other,  so  Mm  is  still  deterministic. 

Now  we  can  modify  M'"  so  that  il  always  has  a  move  to  make  from  any  pre$ 
state.  To  do  this,  we  add  transitions  into  the  new  dead  states.  M always  has  a 
move  from  s',  so  we  don't  have  to  consider  il  further.  In  order  to  guarantee  that 
XT'  will  always  be  able  to  make  a  move  from  any  other  pre$  stale  q ,  it  must  be 
the  case  that,  for  every  (q.  c.  y).  where  q  is  a  pre$  slate,  ce  X  U  {$},  and  ye  I”", 
there  exists  some  [p.  a)  such  that  either: 

•  Avr  contains  the  e-transition  ((q.  e.y),  ip , «)).  or 

•  A  \/~  contains  the  transition  ((</.  c.  y ),  ( p.  a)). 

Since  XT'  is  deterministic,  it  is  not  possible  for  A  \r  to  contain  both  those  tran¬ 
sitions.  Now  consider  any  slack  symbol  y  and  state  q.  If  Xt'”  contains  an 
e-transition  ((</,  s.  y),  (/>. «)),  no  others  from  q  that  pop  y  are  required.  If  it  does 
not,  then  there  must  be  one  for  each  character  c  in  1  U  { $ }.  If  there  is  no  transi¬ 
tion  ((r/.  $,  y).  (/>,«)),  then  we  add  to  M"'  the  transition  ((</,  $.  y).  (</',  e)).  If,  for 
any  other  character  c.  there  is  no  transition  ( (</.  c.  y ),(/>, «)).  then  we  add  to  XT" 
the  transition  ([q.c,  y).  (d.e)). 

At  this  point,  we  know  that,  until  il  has  read  all  its  input.  XT'  will  always  have 
a  move  to  make.  And  we  know  that  any  string  that  drives  it  to  d'  is  in  (->L)$.  So 
in  the  complement  machine  we  are  eventually  trying  to  build,  d'  should  accept 
any  strings  il  sees.  To  do  that,  it  must  first  clear  the  slack. 
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Next  we  make  sure  that,  from  every  post$  state  except  qa ,  Af always  has  a 
move  it  can  make. There  is  no  input  to  be  read,  so  we  must  assure  that,  for  every 
post$  slate  q  (except  qa)  and  every  stack  symbol  y  e  T,  there  is  a  move.  When  Af 
would  have  died,  A/'"  needs  to  move  to  a  state  that  knows  that  $  has  been  read 
and  that  can  clear  the  stack  (so  that  its  complement  will  eventually  be  able  to 
accept). That  state  is  d'.  So,  if  (q,  e,  y)  is  a  triple  for  which  no  move  is  defined,  add 
the  transition  (( q ,  e.  y).  ( d ',  e)). 

Next,  we  must  make  sure  that  Af never  gets  into  a  loop  that  is  not  making 
progress  toward  at  least  one  of  the  two  things  that  must  occur  before  it  can 
accept:  emptying  the  stack  and  consuming  the  input.  Af determines  its  next 
move  by  considering  only  its  current  state,  the  top  stack  symbol  and  the  current 
input  character.  Any  transition  that  reads  an  input  character  makes  progress,  so 
we  need  only  worry  about  those  that  do  not.  Suppose  that  some  triple  (</,  e,  y) 
matches  against  Af  "”s  current  configuration.  If  that  triple  ever  matches  again  and 
no  progress  has  been  made,  then  none  will  ever  be  made  because  A/'",  since  it  is 
deterministic,  will  simply  do  the  same  thing  the  second  time.  So  we  must  find  all 
the  triples  with  the  property  that,  when  they  match  Af"”s  configuration,  no 
progress  occurs.  Call  these  triples  dead  triples.  We  now  build  a  new  machine  A/"" 
which  is  identical  to  Af except  that  all  dead  triples  that  originate  in  a  pre$  state 
will  drive  Af to  d  and  all  dead  triples  that  originate  in  a  post$  state  will  drive 
Af""  tod'.  So  Af""  =  Af'",  except: 

•  if  (q,  e,  y)  is  a  dead  triple  and  q  is  a  pre$  state  then  delete  any  transition 

((4,  e,  y),  (p,  p))  and  replace  it  by  (( q ,  e,  y),  (c/,  e)). 

•  if  (</,  e.  y)  is  a  dead  triple  and  q  is  a  post$  state  then  delete  any  transition 

{(</*  e,  y),  ( p ,  P))  and  replace  it  by  (( q ,  e,  y),  (d\  e)). 

Now  M"H  has  the  following  properties, 

1.  On  input  ?c$,  if  A/’s  computation  would  have  ended  before  all  of  w%  were 
read,  Af ' "  will  be  able  to  reach  state  d'  and  have  the  stack  empty  except  for  #. 

2.  On  input  w$,  if  A/’s  computation  would  have  looped  forever,  following 
e-transitions,  without  ever  halting  in  either  an  accepting  or  a  nonaccept¬ 
ing  slate,  Af  will  be  able  to  reach  state  d'  and  have  the  stack  empty  ex¬ 
cept  for  #. 

3.  On  input  M'$,  iff  Af  s  computation  would  have  accepted,  Af ,m  will  be  in  state 
qu  and  its  stack  will  be  empty. 

4.  On  any  input  that  does  not  end  in  $,  A/""  will  be  in  some  pre$  state. 

Our  final  step  will  be  to  construct  Af#  that  accepts  We'll  do  that  by 

starting  with  Af  ,  making  qa  nonaccepting,  and  creating  a  path  by  which  d'  can 
pop  the  remaining  #  and  go  to  an  accepting  state.  But,  before  we  can  do  that,  we 
must  consider  two  remaining  cases. 

•  On  input  «>$.  Af  would  have  finished  reading  w$  but  not  emptied  its  stack. 

•  On  input  w$.  Af  would  have  finished  reading  u>$  and  landed  in  a  nonaccepling  state. 
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We  need  to  make  sure  that,  in  both  of  those  cases,  our  final  machine  will  be 
able  to  accept.  Note  that  we  only  want  to  accept  after  reading  $,  so  we  need 
only  worry  about  what  should  do  once  it  has  reached  some  post$  state.  We 
first  guarantee  that  M can  clear  its  slack  except  for  #.  We  do  that  as  follows: 
For  every  post$  state  q  in  M""  (except  q„)  and  every  symbol  c  in  T,  if  M"  does 
not  contain  a  transition  for  the  triple  (<7,  c.  c).  add  the  transition 
((</,  e,  c).  (r/\  e)).  (If  Mm  already  contains  a  transition  for  the  triple  ( q . «,c)  then 
that  transition  must  be  on  a  path  to  clearing  the  slack  or  it  would  already  have 
been  eliminated.) 

It's  now  the  case  that  every  string  of  the  form  ir$.  where  we  £*,  will  drive 
M""  to  some  post$  state  and  either  the  state  is  qa.  in  which  case  the  stack  will 
be  empty,  or  the  stale  is  something  else,  in  which  case  the  slack  contains 
exactly  #.  So  our  next  step  is  to  add  a  new  state  d".  From  every  postS  state  q 
except  (/„  and  any  states  Irom  which  there  is  a  transition  into  q,r  add  the  transi¬ 
tion  ((</.  e,  #).  (r/",e)).  Since  there  were  no  transitions  on  #  from  any  of  those 
states,  the  resulting  machine  is  still  deterministic. 

At  this  point,  M"“  is  in  deterministic  normal  form.  We  can  now  define: 

converlPDAtodelnormalfonn(M:  deterministic  PDA)  = 

1.  Return  jW"”,  constructed  as  described  previously. 

Note  that  M'm  still  accepts  L%  and  it  is  deterministic.  It  is  also  in  restricted  nor¬ 
mal  form  (as  defined  in  Section  12.3.2). 

All  that  remains  is  to  build  M#  to  accept  (->/.,)$.  Let  MU  =  W*1  except  that  d” 
will  be  the  only  accepting  stale.  There  are  no  transitions  out  of  dM,  so  there  is 
never  competition  between  accepting  and  taking  some  transition.  All  and  only 
strings  of  the  form  wS.  where  w  e  2.*  and  w$  was  not  accepted  by  M  will  drive 
M #  to  d "  with  an  empty  slack.  So  Atf#  accepts  (->L) S  and  it  is  deterministic. 


D.3  Proof  of  Parikh's  Theorem 

The  background  for  Parikh's  Theorem  and  the  definitions  of  1 //  and  'P  are  given  in 
Section  13.7. 

THEOREM  D.3  Parikh's  Theorem 


Theorem:  Every  context-free  language  is  letter-equivalent  to  some  regular  language. 

Proof:  We  will  break  the  proof  into  two  parts.  We  will  first  show  that,  for  every 
conlexi-lree  language  L,  'P(L)  is  semilinear.  Then  we  will  show  that  if  'P(L)  is 
semilinear  then  L  is  letter-equivalent  to  some  regular  language. 

For  purposes  of  the  follow  ing  discussion,  define: 

•  The  sum  of  two  vectors  i>|  and  ih.  written  t»|  +  ts.  to  be  the  pairwise  sum  of 
their  elements.  So  (1,2)  +  (5,7)  =  (6,4). 

•  The  product  of  an  integer  n  and  a  vector  V  =  (i|,/2....i\),  written  nv,  to  be 
(/t/‘i.  nil,  ... w*).  So  4(1,2)  =  (4.8). 
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A  set  V  of  integer  vectors  of  the  form  (ij.iS,.../*)  is  linear  iff  there  exists  a 
finite  basis  set  B  and  a  second  finite  set  C  of  vectors  C|,  c2, . . . ,  such  that: 

V  =  {v:(v  =  b  +  +  n2c2  +  ...  +  n  |C|C  |C|),  where  n(,  «|cl 

are  integers,  be  B,  and  c,,  c2, . . . ,  cjd  e  C. 

For  example: 

•  {(2 i,i):0  as  /}  =  { (0. 0),  (2, 1 ),  (4, 2),  (6, 3), . . . }  is  linear:  B  =  {(0,0)}  and 
C  =  {(2.1)}. 

•  {(/,  j) :  0  sis  j)  =  {(0. 0),  (0. 1),  (0, 2).  (1, 3) . (3, 8), . . . }  is  linear: 

B  =  {(0.0)}  and C  =  {(0, 1), (1, 1)}. 

A  set  V  of  integer  vectors  of  the  form  (/,,  i2,...ik)  is  semilinear  iff  it  is  the 
finite  union  of  linear  sets.  For  example,  V  =  {( i.j )  :i  <  j  or  j  <  /}  is  semilinear 
because  V  =  V|  U  V2 .  where: 

•  V,  =  {(0,1),  (0,2) . (1,2),  (1,3) . (3,8),,,.}  is  linear:  B  =  {(0. 1)}  and 

C  =  {(0.1),  (1,1)},  and 

•  {(1,0).  (2,0) . (2,1),  (3.1) . (8,3),... }  is  linear:  B  =  {(l,0)}and 

C  =  {(1,0),  (1, 1)}. 

The  core  of  the  proof  of  Parikh's  Theorem  is  a  proof  of  the  claim  that  if  a  lan¬ 
guage  L  is  context-free,  then  ^V(L)  is  semilinear.  In  fact,  sometimes  that  claim, 
which  we  prove  next,  is  calked  Parikh's  Theorem. 

Let  L  be  an  arbitrary  context-free  language.  Then  L  is  defined  by  some  con¬ 
text-free  grammar  G  =  (V,  2,  R,  S).  Let  n  =  |V  —  2|  (i.e„  the  number  of 
nonterminals  in  G)  and  let  b  be  the  branching  factor  of  G  (i.e.,  the  length  of 
the  longest  right  hand  side  of  any  rule  in  R).  Every  string  in  L  has  at  least  one 
parse  tree  that  can  be  generated  by  G.  For  each  such  string,  choose  one  of  its 
“smallest”  parse  trees.  In  other  words,  choose  one  such  that  there  is  no  other 
one  with  fewer  nodes.  Let  T  be  the  set  of  all  such  chosen  trees.  So  T  contains 
one  smallest  tree  for  each  element  of  L.  For  any  parse  tree  /,  let  yield(t)  be  the 
string  that  is  yield  of  /. 

Let  l  be  an  arbitrary  element  of  T.Then  either: 

•  The  tree  /  contains  no  paths  that  contain  repeated  nonterminals.  By  the 
same  argument  we  used  in  the  proof  of  the  Pumping  Theorem,  the  maxi¬ 
mum  length  of  the  yield  of  /  is  bn.  Call  the  subset  of  T  that  contains  all  such 
trees  Short.  Short  contains  a  finite  number  of  elements  (because  there  is  a 
bound  on  the  length  of  the  yields  and  each  yield  may  correspond  to  only 
one  tree). 

•  The  tree  /  contains  at  least  one  path  that  contains  at  least  one  repeated  nonter¬ 
minal,  as  shown  in  Figure  D.l.  As  we  did  in  the  proof  of  the  Pumping  Theorem, 
we  can  choose  one  such  path  and  find  the  first  repeated  nonterminal,  coming 
up  from  the  bottom,  along  that  path.  Call  the  subtree  rooted  at  the  upper 
instance  {1}  and  the  subtree  rooted  at  the  lower  instance  [2],  We  can  excise  the 
subtree  rooted  at  [  l }  and  replace  it  by  the  subtree  rooted  at  [2],  Call  the  resulting 
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tree  f'  Itu-rc  exist  values  for  u.  r  ,  r,  ».  and  ;  such  that  v ««’/./< r )  •  urxyz. 
vre/dff'l  *  u»z.  i*y  »  r.  In  v1  m~  h"  '  V  «».*  *  um;i.antl  ucntKotnL 
If  r‘  still  contains  anv  paths  that  contain  anv  repeated  nonterminals,  then 
another  it  can  he  pumped  out  to  vicld  vet  another  shottcr  string  in  L.atul  w 
forth  until  a  string  whose  parse  tree  is  in  Short  is  pnOiucd 

l  et  r  he  an  arbitrary  element  of  Short  SSc  will  define  prm/uirtl)  to  lv  the 
smallest  set  of  strings  that  includes  vrrhitri  plus  all  the  longer  strings  in  I.  that 
pump  down  to  wr/iilll  ( )r  think  of  it  as  the  smallest  set  that  includes  viWtifr)  am 
all  the  longer  strings  that  can  he  generated  by  pumping  tutor 

Since  there  is  a  hound  on  rv  .  the  number  of  distinct  values  for  rv  is  finite.  Ivv 
any  tree  r  in  .Short,  define  pumpifo  to  »ve  the  set  of  %  v  strings  that  can  he  pumfvs 
out  of  any  element  of  finsiuirU)  hv  a  single  pumping  operation  lhc  value  o 
/)um/)«(M  depends  only  on  t  and  the  rules  o|  (, 

SSe  now  return  to  describing  strings  just  bs  the  numkr  of  each  character  tha 
they  contain  Let  11  he  an  element  of  pox/u .  nr»  Ihcn  w  contains  all  the  charac 
lets  in  viWrilf)  It  also  contains  all  the  characters  in  each  i  v  pan  that  wav  pumpe 
out  of  u-  in  the  process  of  shrinking  w  down  to  Iie/J(M  let  {  )'  he  a  list  o|  a 
th«*sc  it  pairs  Iso  repeats  arc  included  I  and  let  ./  tv  the  length  of  I'V  (ic..  th 
number  of  times  some  string  n  was  pumped  mil  of  w  to  produce  viehiui)  Not 
that  each  element  of  I  V  must  he  an  element  id  then 
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Lor  thiv  to  be  If uc.  il  must  be  the  cate  that: 

•  Let  m-  be  an  arbitrary  element  of  /mn/nrcfr).  Then  ifr(ir)  it  a  linear  combina¬ 

tion  of  0><  uWiflOl,  the  tingle  vector  in  H.  anil  tome  finite  number  of  vector* 
<  i.  c«.  .  all  of  which  are  drawn  from  ('.We  jutt  taw  that  that  it  true. 

•  I  ct  »•  be  an  arbitrary  vector  that  it  a  linear  combination  of  We/</(f ))  and 

tome  finite  number  of  vector*  r|.c» . all  of  which  arc  drawn  from  C.Thcn 

there  mud  end  tome  dnng  ir  in  pnxUueii)  *uch  that  d»(u)  “  v.  Thit  follow* 
from  the  fact  that  the  Pumping  Lhcorcm  telle  ut  that  any  r*y  tlnng  that  can  be 
puni|vd  out  can  alto  be  pumped  in  any  number  of  time*. 

Now  we  can  prove  that  Mm/.)  it  tcmilinear.  There  arc  a  finite  number  of  ele¬ 
ment*  m  Short  1  very  dnng  in  /.  it  an  element  of  prtniuce(l)  for  tome  t  in  Short. 
So  MM  /  )  it  the  finite  union  of  linear  tett: 

MM/-)  -  U  V'(prtk/ucr(t)). 

It  SK.*t 

Ihc  lad  vtep  in  the  proof  of  I’nrtkh't  theorem  it  to  thow  that,  given  any  temi- 
lineal  tet  r.  there  cxidt  a  tegular  language  /.  uich  that  MM/-)  ■  L.  Let  d»‘*  be  a 
function  that  inapt  from  an  integer  vector  i'  to  the  lexicographically  firtt  tiring  »<• 
tuch  that  iH»i  |  -  r  Lor  example,  if  1  •  (a.b.c  |.  then  *  '((2.1.3))  «  aabccc. 

Wc  begin  b>  thowing  that,  given  any  linear  tel  Li.  there  exidt  a  regular  bn- 
guage  /.,  vuch  that  MM/-t)  -  Lt.  Since  li  it  linear,  it  can  be  detenbed  by  the  tet* 
/t  and  (  I  iotn  them  we  can  produce  a  regular  exprcttion  that  detenbet  L\.  Let 
II  -  |/»|. /»»,.,. )  and  let  ('  -  {r,.c>...  |.  Ihcn  define  /?(L,)  to  be  the  regular 
exprcttion 

<+  ’(MU*  ’(#».) U  ••  )<*  Vi)U*  '(c2) U— )•. 

If  /  it  ihc  language  defined  bv  #f(L,).  Ihcn  MM/-)  ■  L,. 

lot  example,  if  £  -  {a.  b.  c).  and  L  it  defined  by  B  •  {(1.2.3))  and 
<  •  {(I. II. «).(».«.  II).  then  #f(L)  -  (abbcccKa  U  c)*. 

Now  we  return  to  the  pn»blcm  of  thowing  that,  given  any  %emilinear  tel  V. 
Ihcic  cxitlt  a  regular  language  l.  tuch  that  MM/-)  ■  V.  If  Li*  tcmilinear  then  it  i* 
the  finite  union  of  linear  tclt  L,.L;. ....  liven  L  t*  the  language  detenbed  by  Ihc 
regular  exprcttion 

ft(L,)U  Af(Lj)... 

So  we  have: 

•If  I.  it  context-free  then  ♦(/-lit  tcmilinear. 

*  If  MM/.)  it  tcmilinear  Ihcn  there  t*  tome  regular  language  L’  tuch  that 
MM/.  )  -  MM/-). 

\%  otntevl-ftce.  /.  it  letter -equivalent  to  tome  regular  language. 
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The  Theory:  Turing  Machines 
and  Undecidability 

In  this  appendix,  wc  will  prove  sonic  of  the  claims  that  were  made  hut  not  proved  in 
Part  IV. 


E.1  Proof  that  Nondeterminism  Does  Not  Add  Power 
to  Turing  Machines 

In  this  section  we  complete  the  proof  of  Theorem  17.2. 

THEOREM  17.2  Nondeterminism  in  Deciding  and  Semideciding 
Turing  Machines 

Theorem:  If  a  nondeterministic Turing  machine  M  =  ( K ,  S.  V.  A.  .v,  H)  decides  a 
language  L.  then  there  exists  a  deterministic  Turing  machine  M'  that  decides  L.If 
a  nondeterministic  Turing  machine  M  semidecides  a  language  /..then  there  exists 
a  deterministic  Turing  machine  M‘  that  semidecides  /.. 

Discussion:  The  proof  is  by  construction  of  M‘.  When  we  sketched  this  proof  in 
Section  17.3.2.  wc  suggested  using  hreadlh-first  search  as  the  basis  for  the  con¬ 
struction. The  main  obstacle  that  we  face  in  doing  that  is  bookkeeping.  If  we  use 
breadth-first  search,  then  M'  will  need  to  keep  track  of  the  partial  paths  that  it  is 
exploring.  One  approach  would  be  for  it  to  start  down  path  1 ,  stop  after  1  move 
remember  the  path,  go  one  move  down  path  2.  remember  it.  and  so  forth,  until  all 
paths  have  been  explored  for  one  step.  It  could  then  return  to  path  1  (which  has 
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been  stored  somewhere  on  the  tape),  explore  each  of  its  branches  for  one  more 
move,  store  them  somewhere,  find  path  2  on  the  tape,  continue  it  for  one  more 
move,  and  so  forth.  But  this  approach  has  two  drawbacks: 

•  the  amount  of  memory  (tape  space)  required  to  keep  track  of  all  the  partial 
paths  grows  exponentially  with  the  depth  of  the  search,  and 

•  unlike  conventional  computers  with  random  access  memory,  the  work  re¬ 
quired  for  a  Turing  machine  to  scan  the  tape  to  find  each  path  in  turn  and  then 
shift  everything  to  allow  for  insertion  of  new  nodes  into  a  path  could  dominate 
all  the  work  that  it  would  do  in  actually  exploring  paths. 

Iterative  deepening ,  a  hybrid  between  depth-first  search  and  breadth-first 
search, avoids  both  the  infinite  path  pitfall  of  depth-first  search  and  the  exponen¬ 
tially  growing  memory  requirement  of  breadth-first  search.  The  idea  of  iterative 
deepening  is  simple.  We  can  state  the  algorithm  as  follows: 

1D(T:  search  tree)  - 

1.  d  =  1,  I*  set  the  initial  depth  limit  to  1. 

2.  Loop  until  a  solution  is  found: 

2.1.  Starting  at  the  root  node  of  T ,  use  depth-first  to  explore  all  paths  in 
T  of  depth  d. 

2.2.  If  a  solution  is  found,  exit  and  return  it. 

2.3.  Otherwise,  d  =  d  +  1. 

Iterative  deepening  avoids  the  infinite  path  pitfall  of  depth-first  search  by  ex¬ 
ploring  each  path  to  depth  d  before  trying  any  path  to  depth  d  +  1.  So.  if  there  is 
a  finite-length  path  to  a  solution,  it  will  be  found.  And  iterative  deepening  avoids 
the  memory  pitfall  of  breadth-first  search  by  throwing  away  each  partial  path 
when  it  backs  up.  Of  course,  we  do  pay  a  price  for  that:  Each  time  we  start  down 
a  path  of  length  rf  +  1  we  recreate  that  path  up  to  length  d.  That  seems  like  a 
heavy  price,  but  let’s  look  at  it  more  closely. 

Consider  a  tree  such  as  the  one  shown  in  Figure  E.l.  Each  node  in  the  tree  rep¬ 
resents  a  configuration  of  M  and  each  edge  represents  a  step  in  a  computational 
path  that  M  might  follow.  Observe  first  that,  in  iterative  deepening,  the  nodes  at 
the  lop  of  the  tree  get  generated  a  lot  of  times.  The  nodes  at  the  very  bottom  gel 
generated  only  once,  the  ones  at  the  level  above  that  only  twice,  and  so  forth.  For¬ 
tunately.  there  aren’t  very  many  nodes  at  the  top  of  tree.  In  fact,  the  number  of 
nodes  at  any  level  d  is  larger  than  the  total  number  of  nodes  at  all  previous  levels 
by  approximately  a  factor  of  (b  —  1),  where  b  is  the  branching  factor  of  the  tree. 


FIGURE  E.1  A  simple  search  tree. 
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So  starting  over  every  time  is  not  as  bad  as  it  at  first  seems.  In  fact,  the  rela¬ 
tively  inefficient  implementation  of  iterative  deepening  that  we  will  use  exam¬ 
ines  only  a  factor  of  approximately  /;  (the  height  of  the  tree  that  is  eventually 
explored)  more  nodes  than  does  a  simple  breadth-first  search  to  the  correct 
depth.  See  E.2  for  a  proof  of  this  claim. 

Proof:  We  can  now  return  to  the  task  of  proving  that.  Tor  any  nondeterministic  Tur¬ 
ing  machine  M  =  ( K ,  2.  T.  A.*-,  H),  there  exists  an  equivalent  deterministic 
Turing  machine  M'.  The  proof  is  by  construction  of  a  deterministic  M'  that  simu¬ 
lates  the  execution  of  M.  M'  will  operate  as  follows:  Start  with  the  initial  configu¬ 
ration  of  M.  Use  iterative  deepening  to  try  longer  and  longer  computational 
paths.  If  any  path  eventually  accepts.  M'  will  discover  that  and  accept.  If  all  paths 
reject.  M '  will  discover  that  and  reject.  So.  if  M  is  a  deciding  Turing  machine,  M‘ 
will  always  halt.  If  M  is  only  a  semidecider,  however,  then  M‘  may  loop  forever. 

All  that  remains  is  to  describe  how  to  perform  iterative  deepening  on  a  Tur¬ 
ing  machine.  Iterative  deepening  is  usually  implemented  as  a  form  of  bounded 
depth-first  search.  For  each  depth-limited  search,  a  stack  is  used  to  keep  track 
of  the  path  so  far  and  the  search  process  backs  up  whenever  it  reaches  its 
depth  limit.  To  simplify  our  implementation,  we  will  choose  an  approach  that 
does  not  require  a  slack.  Instead  we  will  create  each  path,  starting  from  the 
root,  each  time. 

M'  will  use  three  tapes,  as  shown  in  Figure  E.2.Tapes  1  and  2  correspond  to  the 
current  path.  To  see  how  M‘  works,  we  will  first  define  a  subroutine  Pthat  uses 
tapes  1  and  2  and  follows  one  specific  path  for  some  specified  number  of  steps. 
Then  we  will  see  how  M'  can  invoke  P  on  a  sequence  of  longer  and  longer  paths. 

Suppose  we  want  to  specify  some  one  specific  path  through  the  search  tree 
that  M  explores.  To  do  this,  we  first  need  to  be  able  to  write  down  the  set  of  alter¬ 
natives  for  each  move  in  some  order  so  that  it  makes  sense  to  say.  “Choose  option 
I  this  time.  Choose  option  2  the  second  time,  and  so  forth.”  Imagine  that  we  have 
that.  (We  will  describe  such  a  method  shortly.)  Then  we  could  specify  any  finite 
path  as  a  move  vector:  a  finite  sequence  of  integers  such  as  “2. 5, 1 , 3”,  which  we 
will  interpret  to  mean:  “Follow  a  path  of  length  4.  For  the  first  move,  choose  op¬ 
tion  2.  For  the  next,  choose  option  5.  For  the  third,  choose  option  1.  For  the 
fourth,  choose  option  3.  Halt.”  The  Turing  machine  P  that  we  mentioned  above, 
the  one  that  follows  one  particular  finite  path  and  reports  its  answer,  is  then  a 
machine  that  follows  a  move  vector  such  as  “2. 5. 1 . 3." 

So  now  wc  need  a  way  for  P  to  interpret  a  move  vector.  To  solve  this  problem  we 
first  observe  that  there  is  a  maximum  number  H  of  branches  at  any  point  in  Af s 


Tape  l):  Input:  this  tape  will  never  change  so  the  original  input  is  always  available 

Tape  1:  Copy  of  input:  to  be  changed  as  needed  on  each  path 

Tape  2:  Sequence  of  choices  that  determine  the  current  path 


FIGURE  E.2  Iterative  deepening  on  a  three-tape  Turing  machine. 
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[Table  E.l(a)  A  table  that  lists  all  of  M's  move  choices. 

(state  l.char  1) 
(state  l,char  2) 

(state  2,  char  1) 

(slate  |K|.char|r|) 

/ 

move  choice  l 
move  choice  1 
move  choice  1 
move  choice  1 
move  choice  1 
move  choice  1 

2 

move  choice  2 
move  choice  2 

move  choice  2 
move  choice  2 

J 

move  choice  3 

move  choice  3 

■  »* 

move  choice  4 

B 

(a) 


1 

2 

3 

••• 

B 

(state  l.char  1) 

move  choice  1 

move  choice  2 

move  choice  1 

move  choice  2 

move  choice  1 

(state  l.char 2) 

move  choice  1 

move  choice  2 

move  choice  3 

move  choice  4 

move  choice  1 

move  choice  1 

move  choice  1 

move  choice  1 

move  choice  1 

move  choice  1 

(state  2,  char  1) 

move  choice  1 

move  choice  2 

move  choice  3 

move  choice  1 

move  choice  2 

move  choice  1 

move  choice  2 

move  choice  1 

move  choice  2 

move  choice  1 

(state  |4  char  |P|) 

move  choice  1 

move  choice  l 

move  choice  1 

move  choice  1 

move  choice  l 

<b) 


execution.  For  its  next  move,  M  chooses  from  among  \K\  states  to  go  to,  from  among 
|r|  characters  to  write  on  the  tape,  and  between  moving  left  and  moving  right. Thus 

B  =  2-|/C|*|r|. 

Since  there  are  only  at  most  B  choices  at  each  point,  the  largest  number  that 
can  occur  in  any  move  vector  for  M  is  B.  Of  course,  it  will  often  happen  that  A  of¬ 
fers  M  many  fewer  choices  given  its  current  state  and  the  character  under  its 
read/write  head.  Suppose  that  we  imagine  organizing  A  so  that,  for  each  ( q.c )  pair, 
we  get  a  list  (in  some  arbitrary  order)  of  the  moves  that  M  may  make  if  it  is  in  state 
t\  and  c  is  under  the  read/write  head.  We  can  enter  that  information  into  an  index¬ 
able  table  T  as  shown  in  Table  E.  1(a).  We  assume  that  we  can  sequentially  number 
both  the  slates  and  the  elements  of  T.  Each  move  choice  is  an  element  of 
(K  x  1  *  { ~ 1 * ,  *—  }).  But  what  happens  if  P  is  told  to  choose  move  j  and  fewer 
than  j  choices  are  available?  To  solve  this  problem,  we  will  fill  out  T  by  repeating 
the  sequence  of  allowable  moves  as  many  times  across  each  row  as  necessary  to 
fill  up  the  row.  So  T  will  actually  be  as  shown  in  Table  E.l(b). 

There  are  two  important  things  about  each  row  in  this  table: 

•  Every  entry  is  a  move  that  is  allowed  by  A.  and 

•  every  move  that  is  allowed  by  A  appears  at  least  once. 

Also  notice  that,  given  a  particular  nondeterministic  Turing  machine  M,  we 
can  build  this  table  and  it  will  contain  a  finite  number  of  cells.  In  addition,  there  is 
a  finite  number  \K  |  of  different  states  that  M  can  be  in  at  any  point  in  following 
some  path.  So,  given  M,  we  can  build  a  new  Turing  machine  P  to  follow  one  of 
M  s  paths  and  we  can  encode  all  of  the  move  table,  as  well  as  the  current  simulat¬ 
ed  state  of  M,  in  P's  finite  state  controller. 
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Tape  I:  Input 

f 

Tape  2;  A  move  vector  such  as:  I  3  2  6  5  4  3  6 

1 - 

FIGURE  E.3  Using  two  tapes  to  simulate  one  path  of  fixed  length. 

We  are  now  ready  to  define  P  (the  Turing  machine  that  follows  one  finite  path 
that  M  could  luke).  P  uses  two  tapes,  as  shown  in  Figure  E.3.The  table  Tand  the  cur¬ 
rent  (simulated)  state  of  M  are  encoded  in  the  stale  of  P,  which  operates  as  follows. 

1.  For  i  =  I  to  length  of  the  move  vector  on  Tape  2  do: 

1.1.  Determine  c,  the  character  under  the  read/write  head  of  Tape  1. 

1.2.  Consider  </,  the  current  simulated  state  of  M.  If  q  is  a  halting  stale,  halt. 
Otherwise: 

1.3.  Determine  v ,  the  value  of  square  i  of  the  move  vector. 

1.4.  Look  in  T  to  determine  the  value  in  the  row  labeled  (q,c)  and  the  col¬ 
umn  labeled  v.  Call  it  m. 

1.5.  Make  move  m  (by  writing  on  tape  1.  moving  tape  Fs  read/write  head, 
and  changing  the  simulated  slate  as  specified  by  /;»). 


Whatever  happens.  P  halls  after  at  most  n  steps,  where  n  *  |Tapc  2|. 

Now  that  we  have  specified  P ,  we  are  ready  to  specify  M'.  the  deterministic 
Turing  machine  that  is  equivalent  to  M.  M'  uses  three  tapes: Tape  0  holds  the  orig¬ 
inal  input  to  M.  It  will  not  change  throughout  the  computation. Tapes  1  and  2  will 
be  used  by  instantiations  of  P.  M's  job  is  to  invoke  P  with  all  paths  of  length  0, 
then  all  paths  of  length  1 ,  all  paths  of  length  2.  and  so  forth.  For  example,  suppose 
that  B  =  4.  Then  the  value  on  Tape  2  at  the  first  several  calls  by  M'  to  P  will  be:  e; 


1:2:3; 4: 1,1;  1,2;  1.3;  1.4; 2,1;  ...;2,4;3,1;  ...;3.4:4.1:  ...;4.4;  1.1,1:1,12; ... 


To  see  how  M'  can  use  P .  let's  first  consider  the  simplest  case,  namely  the  one 
in  which  M'  is  a  semideciding  machine  that  will  accept  if  any  path  of  M  accepts; 
otherwise  it  will  simply  loop  looking  for  some  accepting  path.  In  this  case.  M'  op¬ 
erates  as  follows  on  input  »<’. 


1.  Write  e  (corresponding  to  a  path  of  length  0)  on  Tape  2. 

2.  Until  P  accepts  do: 

2.1.  Copy  w  from  Tape  0  to  Tape  1. 

2.2.  Invoke  P  (i.e.,  simulate  M  for  |Tape  2!  steps  following  the  path  speci¬ 
fied  on  Tape  2). 

2.3.  If  P  discovers  that  M  would  have  accepted  then  halt  and  accept. 

2.4.  Otherwise,  generate  the  lexicographically  next  string  on  Tape  2. 


Next  we  consider  what  must  happen  if  M  is  to  be  able  to  reject  as  well  as  to  ac¬ 
cept.  It  can  only  reject  if  every  path  eventually  halts  and  rejects.  So  now  we  need 
to  design  M'  so  that  it  will  hall  as  soon  as  one  of  the  following  things  happens1 2 

•  It  discovers  a  path  along  which  M  halts  and  accepts.  In  this  case,  M'  accepts. 

•  It  has  tried  all  paths  until  they  hall,  but  all  have  rejected.  In  this  case,  Af 1  rejects. 
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The  first  of  these  conditions  can  be  checked  as  described  above. The  second  is 
a  bit  more  difficult.  Suppose  that  M'  discovers  that  M  would  halt  and  reject  on 
the  path  2, 1.4.  M  must  continue  to  try  to  find  some  accepting  path.  But  it  restarts 
every  path  at  the  beginning.  How  is  it  to  know  not  to  try  2, 1, 4,  l,  or  any  other 
path  starting  with  2, 1,4?  It’s  hard  to  make  it  do  that,  but  we  can  make  it  notice  if 
it  tries  every  path  of  length  n,  for  some  n,  and  all  of  them  have  halted. 

If  every  path  of  M  halts,  then  there  is  some  numbers  that  is  the  maximum  num¬ 
ber  of  moves  made  by  any  path  before  it  halts.  M '  should  be  able  to  notice  that 
every  path  of  length  n  halls.  At  that  point.il  need  not  consider  any  longer  paths.  So 
we’ll  modify  M'  so  that,  in  its  finite  state  controller,  it  remembers  the  value  of  a 
Boolean  variable  we  can  call  nothalted ,  which  we’ll  initialize  to  False.  Whenever 
M'  tries  a  path  that  hasn’t  yet  halted,  it  will  set  nothalted  to  True.  Now  consider  the 
procedure  that  generates  the  lexicographically  next  string  on  tape  2,  Whenever  it 
is  about  to  generate  a  string  that  is  one  symbol  longer  than  its  predecessor  (i.e.,  it 
is  about  to  start  looking  at  longer  paths),  it  will  check  the  value  of  nothalted.  If  it  is 
False ,  then  all  paths  of  the  length  it  was  just  considering  halted.  M'  can  quit.  If,  on 
the  other  hand,  nothalted  is  True ,  then  there  was  at  least  one  path  that  hasn’t  yet 
halted.  M'  needs  to  try  longer  paths. The  variable  nothalted  must  be  reset  to  False , 
and  the  next  longer  set  of  paths  considered.  So  M’  operates  as  follows  on  input  w : 

1.  Write  e  (corresponding  to  a  path  of  length  0)  on  Tape  2. 

2.  Set  nothalted  to  False. 

3.  Until  P  accepts  or  rejects  do: 

3.1.  Copy  w  from  Tape  0  to  Tape  1 . 

3.2.  Invoke  P  (i.e.,  simulate  M  for  |Tapc  2|  steps  following  the  path  speci¬ 
fied  on  Tape  2). 

3.3.  If  P  discovers  that  M  would  have  accepted  then  accept. 

3.4.  If  P  discovers  that  M  would  not  have  halted,  then  set  nothalted  to  True. 

3.5.  If  the  lexicographically  next  string  on  Tape  2  would  be  longer  than  the 
current  one  then: 

Check  the  value  of  nothalted.  If  it  is  False ,  then  reject.  All  paths  of 
the  current  length  halted  but  none  of  them  accepted. 

Otherwise,  set  nothalted  to  False.  We’ll  try  again  with  paths  of  the 
next  longer  length  and  see  if  all  of  them  halt. 

3.6.  Generate  the  lexicographically  next  string  on  Tape  2. 

If  M  is  a  semideciding Turing  machine,  then  M’  will  accept  iff  M  would.  If  M  is 
a  deciding  Turing  machine,  all  of  its  paths  must  eventually  hall.  If  one  of  them  ac¬ 
cepts.  M'  will  find  it  and  accept.  If  all  of  them  reject,  M'  will  notice  when  all  paths 
of  a  given  length  have  halted  without  accepting.  At  that  point,  it  will  reject.  So  M’ 
is  a  deciding  machine  for  L(M). 

An  Analysis  of  Iterative  Deepening 

Consider  a  complete  tree  T  with  branching  factor  h  and  height  h.  Assume  that  each 
node  of  T,  including  the  root,  corresponds  to  a  state  and  each  edge  corresponds  to  a 

move  trom  one  state  to  another.  We  want  to  compare  the  number  of  moves  that  will  be 
considered  for  each  of  ihrei*  «#»»«■/•!» 
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Tape  I:  Input 


Tape  2:  A  move  vector  such  as:  I  3  2  ft  S  4  3  6 

i 

FIGURE  E.3  Using  two  tapes  to  simulate  one  path  of  fixed  length. 


We  are  now  ready  to  define  P  (the  Turing  machine  that  follows  one  finite  path 
that  M  could  take).  P  uses  two  tapes,  as  shown  in  Figure  E.3.The  table  T  and  the  cur¬ 
rent  (simulated)  state  of  M  are  encoded  in  the  slate  of  P.  which  operates  as  follows. 

1.  For  i  =  1  to  length  of  the  move  vector  on  Tape  2  do: 

1.1.  Determine  c,  the  character  under  the  read/write  head  of  Tape  1. 

1.2.  Consider  q%  the  current  simulated  stale  of  M.  If  </  is  a  halting  state,  halt. 
Otherwise: 

U.  Determine  v,  the  value  of  square  /  of  the  move  vector. 

1.4.  Look  in  T  to  determine  the  value  in  the  row  labeled  (r/.c  )  and  the  col¬ 
umn  labeled  o.  Call  it  m. 

1.5.  Make  move  m  (by  writing  on  tape  1.  moving  tape  l‘s  read/write  head, 
and  changing  the  simulated  state  as  specified  by  hi). 


Whatever  happens,  P  halts  after  at  most  n  steps,  where  n  =  |Tape  2|. 

Now  that  we  have  specified  P.  we  are  ready  to  specify  A/',  the  deterministic 
Turing  machine  that  is  equivalent  to  M.  M'  uses  three  tapes: Tape  0  holds  the  orig¬ 
inal  input  to  M.  It  will  not  change  throughout  the  computation. Tapes  1  and  2  will 
be  used  by  instantiations  of  P.  M'  s  job  is  to  invoke  P  with  all  paths  of  length  0, 
then  all  paths  of  length  1,  all  paths  of  length  2,  and  so  forth.  For  example,  suppose 
that  B  =  4.  Then  the  value  on  Tape  2  at  the  first  several  calls  by  M'  to  P  will  be:  e; 
1:2; 3; 4;  1.1;  1,2;  1,3;  1,4; 2.1;  ...;2,4;3,1;  ...; 3.4; 4.1;  ...;4.4;  ill, l;  1,1,2;  ... 


To  see  how  M'  can  use  P ,  let’s  first  consider  the  simplest  case,  namely  the  one 
in  which  M'  is  a  semideciding  machine  that  will  accept  if  any  path  of  M  accepts; 
otherwise  it  will  simply  loop  looking  for  some  accepting  path.  In  this  case,  M'  op¬ 
erates  as  follows  on  input  w. 


1.  Write  e  (corresponding  to  a  path  of  length  0)  on  Tape  2. 

2.  Until  P  accepts  do: 

2.1.  Copy  w  from  Tape  0  to  Tape  1 . 

2.2.  Invoke  P  (i.e..  simulate  M  for  |Tape  2|  steps  following  the  path  speci¬ 
fied  on  Tape  2). 

2.3.  If  P  discovers  that  M  would  have  accepted  then  halt  and  accept. 

2.4.  Otherwise,  generate  the  lexicographically  next  string  on  Tape  2. 

Next  we  consider  what  must  happen  if  M  is  to  be  able  to  reject  as  well  as  to  ac¬ 
cept.  It  can  only  reject  if  every  path  eventually  halts  and  rejects.  So  now  we  need 
to  design  M'  so  that  it  will  halt  as  soon  as  one  of  the  following  things  happens* 

•  It  discovers  a  path  along  which  M  halts  and  accepts.  In  this  case,  M’  accepts 

•  It  has  tried  all  paths  until  they  halt,  but  all  have  rejected.  In  this  case.  M'  rejects 
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The  first  of  these  conditions  can  be  checked  as  described  above. The  second  is 
a  bit  more  difficult.  Suppose  that  M'  discovers  that  M  would  halt  and  reject  on 
the  path  2, 1 . 4.  M  must  continue  to  try  to  find  some  accepting  path.  But  it  restarts 
every  path  at  the  beginning.  How  is  it  to  know  not  to  try  2, 1, 4, 1,  or  any  other 
path  starling  with  2, 1 , 47  It’s  hard  to  make  it  do  that,  but  we  can  make  it  notice  if 
it  tries  every  path  of  length  n ,  for  some  /i.  and  all  of  them  have  halted. 

If  every  path  of  M  halts,  then  there  is  some  number  n  that  is  the  maximum  num¬ 
ber  of  moves  made  by  any  path  before  it  halts.  M'  should  be  able  to  notice  that 
every  path  of  length  n  halts.  At  that  point.it  need  not  consider  any  longer  paths.  So 
we'll  modify  M'  so  that,  in  its  finite  state  controller,  it  remembers  the  value  of  a 
Boolean  variable  we  can  call  unlimited,  which  we'll  initialize  to  False.  Whenever 
M'  tries  a  path  that  hasn’t  yet  halted,  it  will  set  noihalted  to  True.  Now  consider  the 
procedure  that  generates  the  lexicographically  next  string  on  tape  2.  Whenever  it 
is  about  to  generate  a  string  that  is  one  symbol  longer  than  its  predecessor  (i.e..  it 
is  about  to  start  looking  at  longer  paths),  it  will  check  the  value  of  nothalted.  If  it  is 
False,  then  all  paths  of  the  length  it  was  just  considering  halted.  M '  can  quit.  If,  on 
the  other  hand,  nothalted  is  True,  then  there  was  at  least  one  path  that  hasn't  yet 
halted.  M'  needs  to  try  longer  paths. The  variable  nothalted  must  be  reset  to  False , 
and  the  next  longer  set  of  paths  considered.  So  M'  operates  as  follows  on  input  to: 

1.  Write  e  (corresponding  to  a  path  of  length  0)  on  Tape  2. 

2.  Set  nothalted  to  False . 

3.  Until  P  accepts  or  rejects  do: 

3.1.  Copy  w  from  Tape  0  to  Tape  1. 

3.2.  Invoke  P  (i.e.,  simulate  M  for  |Tape  2|  steps  following  the  path  speci¬ 
fied  on  Tape  2). 

3.3.  If  P  discovers  that  M  would  have  accepted  then  accept. 

3.4.  If  P  discovers  that  M  would  not  have  halted,  then  set  nothalted  to  True. 

3.5.  If  the  lexicographically  next  string  on  Tape  2  would  be  longer  than  the 
current  one  then: 

Check  the  value  of  nothalted.  If  it  is  False,  then  reject.  All  paths  of 
the  current  length  halted  but  none  of  them  accepted. 

Otherwise,  set  nothalted  to  False.  We'll  try  again  with  paths  of  the 
next  longer  length  and  see  if  all  of  them  halt. 

3.6.  Generate  the  lexicographically  next  string  on  Tape  2. 

If  M  is  a  semideciding Turing  machine,  then  M'  will  accept  iff  M  would.  If  M  is 
a  decidingTUring  machine,  all  ol  its  paths  must  eventually  halt.  If  one  of  them  ac¬ 
cepts.  M  will  find  it  and  accept.  If  all  of  them  reject,  M'  will  notice  when  all  paths 
ol  a  given  length  have  halted  without  accepting.  At  that  point,  it  will  reject.  So  M1 
is  a  deciding  machine  for  L(M). 

An  Analysis  of  Itorative  Deepening 

Consider  a  complete  tree  T  with  branching  factor  h  and  height  h.  Assume  that  each 
node  of  P,  including  the  root,  corresponds  to  a  state  and  each  edge  corresponds  to  a 
move  from  one  stale  to  another.  We  want  to  compare  the  number  of  moves  that  will  be 
considered  lor  each  of  three  search  strategies. 
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We  first  consider  a  straightforward  breadth-Jirst  search. Theve  are  b'1  edges  between 
nodes  at  level  d  -  1  and  nodes  at  level  it.  So  the  number  of  moves  that  will  be  consid¬ 
ered  by  breadth-first  search,  to  depth  It.  will  be: 


b(bh  -  1) 
b  -  1 


C(bh). 


Now  suppose  that  we  use  standard  iterative  deepening,  defined  as  follows: 

1D(T:  search  tree)  = 

d  =  I.  /*  Set  the  initial  depth  limit  to  1. 

2.  Loop  until  a  solution  is  found: 

2.1.  Starting  at  the  root  node  of  T ,  use  depth-first  to  explore  all  paths  in  T 
of  depth  d. 

2.2.  If  a  solution  is  found,  exit  and  return  it. 

2.3.  Otherwise,  d  -  d  +  1. 

Assume  that  ID  halls  with  a  solution  at  depth  h. Then  the  number  of  moves  that  it 
considered  is. 

'»-!/  <l  \  */•»  \  bh^-(h  +  l)/r  +  M 

at  least:  ]£//  +  /»,and  at  most:  ^  I  = - - - — ^ - =  0(bh). 

</  *  I  \  A'  -  I  /  •/  - 1  \  A  =  1  /  (/>-!)' 


The  lower  bound  comes  from  the  fact  that  ID  must  have  explored  at  least  one  path 
at  depth  li  or  it  would  have  halted  at  depth  /i-l.The  upper  bound  corresponds  to  it 
finding  a  solution  on  the  very  last  path  in  the  tree. To  see  where  that  upper  bound  for¬ 
mula  comes  from,  notice  that  ID  makes  one  pass  through  its  loop  for  each  value  of  d, 
so  we  must  sum  over  all  of  them.  On  the  d"'  pass,  it  does  a  simple  depth-first  search  of 
a  tree  of  depth  d  and  branching  factor  b. 

Now  consider  a  variant  of  iterative  deepening  in  which,  instead  of  doing  a  back¬ 
tracking  search  at  each  depth  limit  d.  we  start  each  path  over  again  at  the  root.  So  each 
path  of  length  l  is  considered.  Then  each  path  of  length  2  is  considered,  starting  each 
from  the  root.  Then  each  path  of  length  3  is  considered,  starting  from  the  root,  and  so 
forth.  This  is  the  technique  we  used  in  Section  36.1  to  prove  Theorem  17.2.  Because 
reaching  each  of  the  b'1  nodes  at  level  d  requires  d  moves,  the  number  of  moves  that 
this  algorithm  considers  is, 

a-'  *  ht*'"1  -  (7i  +  j  )/>* ^ 1  -I-  b 

at  least:  ^d  h'1  +  /i.  and  at  most:  J>//  lr  - - — - -  OUib1'). 

j* l  (/-i  (b  “  1)" 


E.3  The  Power  of  Reduction 

Define  a  planar  grid  to  be  a  set  of  lines  with  two  properties: 

•  No  two  lines  are  co-linear,  and 

•  each  line  is  either  parallel  or  perpendicular  to  every  other  one. 
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We  *11  call  each  position  at  which  two  lines  intersect  a  grid  point  or  just  a  point.  Now 
consider  the  following  problem  from  [Dijkstra  EWD-1248]: 

Show  that,  for  any  finite  set  of  grid  points  in  the  plane,  we  can  colour  each  of  the 
points  either  red  or  blue  such  that  on  each  grid  line  the  number  of  red  points  dif¬ 
fers  by  at  most  1  from  the  number  of  blue  points. 

An  instance  of  this  problem  could  be  the  grid  shown  in  Figure  E.4(a).  The  selected 
grid  points  are  shown  as  circles.  One  way  to  attack  the  problem  is  directly:  We  could 
prove  the  claim  using  operations  on  the  grid.  An  alternative  is  to  reduce  the  problem  to 
one  that  is  stated  in  some  other  terms  that  give  us  useful  tools  for  finding  a  solution. 

[Misra  1996]  suggests  reducing  this  grid  problem  to  a  graph  problem. The  reduction 
described  there  works  as  follows:  Given  a  grid  and  a  finite  set  of  points  on  the  grid, 
construct  a  graph  in  which  each  grid  line  becomes  a  vertex  and  there  is  an  edge  be¬ 
tween  two  vertices  iff  the  corresponding  grid  lines  share  one  of  the  given  points.  The 
graph  that  is  produced  from  our  example  grid  is  shown  in  Figure  E.4(b). 

Notice  that  the  number  of  edges  in  the  constructed  graph  is  finite  (since  the  number 
of  points  in  the  grid  problem  is  finite).  The  problem  to  be  solved  is  now  to  show  that 
there  exists  a  way  to  color  the  edges  of  the  graph  in  such  a  way  that  the  polarity  of  each 
vertex  is  at  most  one.  Define  the  polarity  of  a  vertex  to  be  the  absolute  value  of  the  dif¬ 
ference  between  the  number  of  red  and  blue  edges  incident  on  it.  We’ll  show  that  the 
required  coloring  exists  by  describing  an  algorithm  to  construct  it. 

Observe  that,  in  any  graph  that  this  reduction  builds,  each  vertex  corresponds  either 
to  a  vertical  or  to  a  horizontal  grid  line.  Since  each  edge  connects  a  “vertical”  vertex  to 
a  “horizontal”  vertex,  the  graph  must  be  bipartite.  (In  other  words,  it  is  possible  to  di¬ 
vide  the  vertices  into  two  sets,  in  this  case  the  “horizontal”  ones  and  the  “vertical”  ones, 
in  such  a  way  that  no  edge  is  incident  on  two  vertices  in  the  same  set.) 

Now  we  can  exploit  anything  we  know  about  bipartite  graphs.  In  particular,  we’ll  use 
the  fact  that,  in  a  bipartite  graph,  every  cycle  has  an  even  number  of  edges.  So,  in  any 
cycle,  we  can  color  the  edges  alternately,  red  and  blue,  without  affecting  the  polarity  of 
any  vertex.  Hence,  we  may  remove  all  cycles  from  the  graph  (in  arbitrary  order)  and 
solve  the  coloring  problem  over  the  remaining  edges.  After  removing  the  cycles,  we  are 
left  with  an  acyclic  undirected  graph  (i.e.,  a  tree  or  a  forest  of  trees). 


FIGURE  E.4  A  grid  problem  and  its  corresponding  graph  version. 
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If  the  forest  is  not  connected,  then  each  maximal  connected  tree  within  it  can  be  col¬ 
ored  independently  of  the  others  since  no  pair  of  such  trees  shares  any  vertices. To  color 
each  tree,  begin  by  designating  some  vertex  to  be  the  root.  Color  the  edges  incident  on 
the  root  alternately.  Then  pick  any  vertex  that  has  both  colored  and  uncolored  incident 
edges.  If  there  is  no  such  vertex  then  all  edges  have  been  colored.  Otherwise,  the  vertex 
has  exactly  one  colored  edge,  say  red,  incident  on  it;  color  the  incident  uncolored  edges 
alternately  starting  with  blue,  so  as  to  meet  the  polarity  constraint. 


E.4  The  Undecidability  of  the  Post 
Correspondence  Problem 

In  Section  22.2,  we  defined  the  language: 

•  PCP  =  {<P>  :  P  is  an  instance  of  the  Post  Correspondence  problem  and  P  has  a 
solution}. 

Theorem  22.1  asserts  that  PCP  is  in  SD/D.  We  proved  that  it  is  in  SD  by  presenting  the  al¬ 
gorithm,  M pep.  that  semidecides  it.  We  will  now  present  the  proof  that  it  is  not  in  D. 

We  begin  by  defining  a  related  language  MPCP  (modified  PCP).  An  instance  of 
MPCP  looks  exactly  like  an  instance  of  PCP.  So  it  is  a  string  <P>  of  the  form: 

<P>  =  (x,,  x2,  *3 . .  x„)(y,,  yz,  )b.  •  •  • ,  y„),  where  V;  ( x,  e  2*  and  yy  e  2+). 

The  difference  between  PCP  and  MPCP  is  in  the  definition  of  a  solution.  A  solution 
to  an  MPCP  instance  is  a  finite  sequence  1,  i2, . . .  i*  of  integers  such  that: 

Vy  (1  —  ij  —  n  and  X|X,,...xJt  —  I'l)'/, . . . yj(). 

In  other  words,  the  first  index  in  any  solution  must  be  1. 

Recall  that  Theorem  23.3  tells  us  that  the  language  La  =  {<G,  w>  :  G  is  an  unre¬ 
stricted  grammar  and  w  e  L(G))  is  not  in  D.  We  will  show  that  PCP  is  not  in  D  in  two 
steps.  We  will  prove  that: 

•  L,  s  MPCP,  so  MPCP  is  not  in  D  because  Lfl  isn’t. 

•  MPCP  ^  PCP,  so  PCP  is  not  in  D  because  MPCP  isn’t. 

THEOREM  E.1  MPCP  is  Not  in  D 

__  - 

Theorem:  MPCP  is  not  in  D. 

Proof:  The  proof  is  by  reduction  from  L,  =  { <G.  w>  :  G  is  an  unrestricted  gram¬ 
mar  and  mieL(C)}.  Given  a  string  <G.  ?<»,  we  show  how  to  construct  an  in¬ 
stance  P  of  MPCP  with  the  property  that  P  has  a  solution  iff  G  generates  w  (and 
thus  <G.  v»  is  in  LJ.The  idea  is  that  we’ll  construct  the  X  and  Y  lists  of  P  so 
that  they  can  be  used  to  build  up  strings  that  describe  derivations  that  G  can 
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produce.  We’ll  make  sure  that  it  is  possible  to  build  the  same  string  from  the  two 
lists  exactly  in  case  G  can  generate  w. 

Let  G  =  (V,  2.  R.  S )  be  an  unrestricted  grammar.  Suppose  that  G  can  derive 
w.Then  there  is  a  string  of  the  following  form  that  describes  the  derivation: 

S=>r]=».V2=»  ...  =*w 

Let  %  and  &  be  two  symbols  that  are  not  in  V.  We ’ll  use  %  to  mark  the  beginning  of 
a  derivation  and  &  to  mark  the  end.  Using  this  convention,  a  derivation  will  look  like: 

%S  =>  JC|  =>  x2  =>  ...  =*  w& 

From  G  and  u\  the  reduction  that  we  are  about  to  define  will  construct  an 
MPCP  instance  with  the  property  that  both  the  X  list  and  the  Y  list  can  be  used  to 
generate  such  derivation  strings.  We’ll  design  the  two  lists  so  that  when  we  use 
the  X  list  we  are  one  derivation  step  ahead  of  where  we  are  when  we  use  the  Y 
list.  So  the  only  way  for  the  two  lists  to  end  up  generating  the  same  derivation 
string  will  be  to  choose  a  final  index  that  lets  Y  catch  up.  We'll  make  sure  that  that 
can  happen  only  when  the  final  generated  string  is  w. 

Specifically  given  G  =  (V,  2,  R,  S)  and  to.  we  will  build  the  X  and  Y  lists  as 
shown  in  Table  E.2.The  entry  that  is  listed  on  line  one  must  be  on  line  one.  Since 
any  solution  to  an  MPCP  problem  must  be  a  sequence  that  starts  with  l.we  thus 
guarantee  that  any  solution  must  generate  a  string  that  begins  with  %S  => .  The 
other  entries  may  occur  in  any  order.  Notice  that  the  entries  that  correspond  to 
the  rules  of  G  are  “backwards.”  This  happens  because  the  Xgenerated  siring  is 
one  derivation  ahead  of  the  V-generated  one. 

To  see  how  this  construction  works,  we’ll  consider  a  simple  example.  Let 
G  =  ({5,  A,  B .  a,b,c},  {a,b,c},  /?,S).  where  R  = 

S  — ABc 
S  — ABSc 
AB-*  BA 
fie  —  be 
BA  —  a 
A—  a 

Given  G  and  the  string  w  =  ac,  the  reduction  will  build  the  MPCP  instance 
shown  in  Table  E.3.  G  can  derive  ac.  So  this  MPCP  instance  has  a  solution,  (1,9, 
15, 11,6, 15, 13, 6, 2),  shown  in  Figure  E.5. 

To  complete  the  proof,  we  must  show  that  the  MPCP  instance,  P,  that  is  built 
from  the  input,  <G,  w>,  has  a  solution  iff  G  can  derive  w.  The  formal  argument 
can  be  made  by  induction  on  the  length  of  a  derivation.  We  omit  it.  The  general 
idea  is  as  we  suggested  above.  Any  MPCP  solution  starts  with  the  index  1.  Given 
the  lists  as  we  have  described  them,  this  means  that  the  Xgenerated  string  starts 
out  with  one  more  derivation  step  than  does  the  T-generated  string. The  only  way 
for  the  V-gencrated  string  to  “catch  up’’  is  to  use  the  second  entry  in  the  table.  If 
that  is  done  then  the  final  generated  string  can  only  be  w.  In  between  the  first 
index  and  the  last  one,  all  the  table  entries  have  been  constructed  so  that  all  and 
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Table  E  J  Building  an  MPCP  instance  from  a  grammar  and  a  string. 

X 

Y 

Comment 

1 

%s=» 

% 

Get  started,  with  X  one 
step  ahead. 

& 

■=>w& 

End,  with  Y  doing  its  last 
step  and  catching  up. 

C 

c 

For  every  symbol  c  in  V 

Copy  characters  that 
don't  change. 

P 

or 

For  every  rule  a-*  p  in  R 

Apply  each  rule.  X  will 
generate  p  when  Y  is  one 
step  behind  and  so  is 
generating  a. 

=> 

■=> 

Table  EJ  An  example  of  building  an  MPCP 
instance  from  a  grammar  and  a  string. 

X 

Y 

X 

Y 

1 

%s=* 

% 

9 

ABc 

S 

2 

& 

=*  ac& 

10 

ABSc 

S 

3 

s 

S 

11 

BA 

AB 

4 

A 

A 

12 

be 

Be 

5 

B 

B 

13 

a 

BA 

6 

c 

c 

14 

a 

A 

7 

b 

b 

15 

=* 

8 

a 

a 

only  derivations  that  match  the  rules  in  G  can  be  generated.  So  the  two  lists  will 
correspond  iff  G  can  generate  w. 

So  we  have  that  La  <  MPCP.  Let  R(<G ,  w>)  be  the  reduction  that  we  have 
just  described.  If  there  existed  a  Turing  machine  M  that  decided  MPCP,  then 
M(R(<G,  w>))  would  decide  La.  But  La  is  not  in  D.so  no  decider  for  it  exists. 
So  M  does  not  exist  either  and  MPCP  is  not  in  D. 


FIGURE  E3  This 
MPCP  instance  has  (1, 9, 
15, 11,6, 15, 13, 6, 2)  as  a 
solution. 
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theorem  E.2  PCP  is  Not  in  D _ 

Theorem:  PCP  is  not  in  D. 

Proof:  The  proof  is  by  reduction  from  MPCP.  In  moving  from  MPCP  to  PCP,  we  lose 
the  constraint  that  a  solution  necessarily  starts  with  1.  But  we  can  effectively  retain 
that  constraint  by  modifying  the  X  and  Y  lists  so  that  the  only  sequences  that  will 
cause  the  two  lists  to  generate  the  same  string  must  start  with  1.  Given  an  MPCP  in¬ 
stance  <X,  Y>,  we  will  create  a  PCP  instance  <A,  B>  with  the  property  that 
<X,  Y>  has  a  solution  iff  <A,  B>  does.The  new  lists  <A,  B>  will  differ  from  the 
original  ones  in  two  ways:  Each  list  will  contain  two  new  strings  and  each  string  will 
be  made  twice  as  long  by  inserting  a  special  symbol  after  each  original  symbol  (in 
the  case  of  the  X  list)  or  before  each  original  symbol  (in  the  case  of  the  Y  list). 

Let  M P  =  <X,  Y>  be  an  instance  of  MPCP  with  alphabet  2  and  size  n.  Let 
<f  and  $  be  two  characters  that  are  not  in  2.  We  will  build  P  =  <  A,  B>,  an  in¬ 
stance  of  PCP  with  alphabet  2  U  {tf,  $}  and  size  n  4-  2  as  follows. 

•  Assume  that:  X  =  Xj,  x2, . . . xn  and  Y  =  yx,  y* . . . y„. 

•  We  construct:  A  =  a0 ,  au  a2, ... .  fl„+1  and  B  =  ft0,  bu  ft*  •  •  •  &#»  bn  + i* 

For  values  of  i  between  1  and  n,  construct  the  elements  of  the  A  and  B  lists  as 
follows. 

•  Let  a,  be  x,  except  that  the  symbol  if  will  be  inserted  after  each  symbol  of  x,. 
For  example,  if  x,  is  aab  then  at  will  be  atfatfbtf. 

•  Let  bj  be  y,-  except  that  the  symbol  if  will  be  inserted  before  each  symbol  of  yx. 
For  example,  if  yt  is  aab  then  ft,  will  be  ^atfatfb. 

Then  let:  a0  = 

an  +  l  =  $» 

fto  —  ftj ,  and 

ft«+ 1  = 

For  example: 

•  If:  X  =  a,  baa  and  y  =  ab,  aa 

•  Then:  A  =  ifaif,  atf,  bif  a?  a^,$  and  B  =  £a?  b,<f  a£  b,  if  a#  a,  if  $ 

Now  we  must  show  that  MP  -  <  X,  Y>  has  an  MPCP  solution  iff  P  =  <A,  B> 
has  a  PCP  solution: 

•  If  MP  -  <X,  Y>  has  an  MPCP  solution,  then  it  is  of  the  form  (1,  /*  73, . . .  /*), 

for  some  k.  In  that  case,  the  sequence  (0,  ;2, 7*3,  n  +  1)  is  a  solution  to 
P  ~  ,®>•  siring  that  this  new  sequence  produces  will  be  identical  to 

the  string  that  the  original  sequence  produced  except  that  there  will  be  the 
symbol  between  each  pair  of  other  symbols  and  the  string  will  start  with  if  and 
end  with  0$.  We  choose  the  first  element  of  the  sequence  to  be  0  rather  than  1 
to  create  the  initial  <f  in  the  A-generated  list,  and  we  add  the  final  element, 
n  +  ,  so  that  the  ^-generated  sequence  can  catch  up  and  contain  the  final  if. 
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•  If  P  -  <A,  B>  has  a  PCP  solution  S.  then  it  must  be  of  the  form 
(0,  <main  part> ,  n  +  1),  where  <main  pari>  is  all  of  S  minus  its  first  and 
last  elements.  We  know  that  S  has  to  start  with  0  because  every  string  in  the  B 
list  starts  with  the  symbol  <f.  The  only  string  in  the  A  list  that  starts  with  the 
symbol  e  is  the  first  one  (which  we've  numbered  0).  So  the  only  way  that  the 
strings  that  are  generated  from  the  two  lists  can  match  is  for  the  first  index  to 
be  0.  We  know  that  S  must  end  with  n  +  1  because  every  string  in  the  A  list  ex¬ 
cept  the  last  ends  with  if.  But  no  string  in  the  B  list  does.  But  B' s  n  +  isl  ele¬ 
ment  provides  that  final  e  and  it  provides  nothing  else  except  the  final  $.The 
siring  that  S  produces  is  identical,  if  we  remove  all  instances  of  f  and  $,  to  the 
string  that  the  sequence  (1  ,<mainpart>)  would  produce  given  <X,Y>. 
This  is  true  because  we  constructed  elements  1  through  n  of  <A,  B>  to  be 
identical  to  the  corresponding  elements  of  <  X .  Y>  except  for  the  insertion  of 
C  and  $.  And  we  guaranteed,  again  ignoring  if  and  $,  that  «f,  =  =  X\.  So  the 

sequence  (1,  <main  pari>)  generates  the  same  string  from  both  the  X  and  Y 
lists  and  its  first  element  is  1.  So  it  is  an  MPC'P  solution  for  MP  =  <X,  Y>. 

So  we  have  that  MPCP  s  PCP.  Let  R(< X.  Y>)  be  the  reduction  that  we 
have  just  described.  If  there  existed  a  Turing  machine  M  that  decided  PCP,  then 
M(R(<X ,  />))  would  decide  MPCP.  But  MPCP  is  not  in  D,so  no  decider  for  it 
exists.  So  M  does  not  exist  either  and  PCP  is  not  in  D. 


appendix  f 

The  Theory:  Complexity 


n  this  appendix,  we  will  prove  some  of  the  claims  that  were  made  but  not  proved  in 
Part  V. 


f/\  Asymptotic  Dominance 

In  this  section  we  prove  the  claims  made  in  Section  27.5. 

p  l.1  Facts  about  O 

We  will  prove  separately  each  of  the  claims  made  in  the  theorem.  The  basis  for  these 
proofs  is  the  definition  of  the  relation  O:  /(n)  e  C7(g(/i))  iff  there  exists  a  positive  inte¬ 
ger  k  and  a  positive  constant  c  such  that: 

Vn  >  k  (J(n)  s  cg(n)). 

Let  /,  f\,  />  g.  gj.  and  g2  be  functions  from  the  natural  numbers  to  the  positive  reals, 
let  a  and  b  be  arbitrary  real  constants,  and  let  c,  c0,  ct, , . .  ck  be  any  positive  real  constants. 

Fact  1 :  f(n)  e  Off(n)) 

Let  k  =  0  and  c  -  l.ThenVn  s  k  {f(n)  s  cf(n)). 

Fact  2:  Addition 

1.  0(f{n))  —  C9(/(/t)  +  Cq)  (if  we  make  the  assumption,  which  will  always  be  true 
for  the  functions  we  will  be  considering,  that  1  e  0(/(n))). 

We  first  note  that,  for  any  function  g(/i),  if  g(n)  e  then  it  must  also  be 

true  that  g{n)  e  0{J(n)  +  cw)  since  f(n)  <  /(n)  +  cq.  Now  we  show  the  other 
direction.  Since  1  e  0(J{n)),  there  must  exist  fc,  and  such  that: 
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Vn  £:  /c,  (1  2=  Cj /(«)) 

(c0  s  CflCi/(«)).  (1) 

If  g(n)  e  0(J(n )  +  cc)  then  there  must  exist  k2  and  c2  such  that: 

Vn  ^  k2  ( g(n )  ^  c2(/(n)  +  c0)). 

Combining  that  with  (1),  we  get: 

Vn  s  max(kuk2 )  (g(n)  2S  c2(/(n)  +  CoC^n)) 

S  (c2  +  CoC, )(/(«))). 

So  let  =  max(kx,  k2)  and  c  =  c2  +  CqCj.  Then, 

Vn  ^  k  ( g(n )  ^  c/(n)). 

2.  If  /i(n)eC?(g,(n))  and  /2(n)  e 0(g2(n))  then  /,(n)  +  f2(n)  e  0(gx(n)  + 

If  /,(n)  e  0(gi(n))  and  /2(n)  e  C?(g2(*i)),  then  there  must  exist  kx,  clt  k2  and  c2 
such  that: 

Vn  s  *,  (/i(n)  2;  Cigi(rt)). 

Vn  ^  k2  (/2(n)  s  c2g2(n)). 

So:  Vn  ^  max(A:1,  *2)  (/,(n)  +  /2(n)  2=  qg,(n)  +  c2g2  (n) 

^  majr(c,.c2)(g,(n)  +  g2(n))). 

So  let  k  =  max(kx,k2)  andc  =  max(cu  c2).  Then, 

Vn  ^  k  (/i(n)  +  /2(n)  ^  c(gt(n)  +  g2(n))). 

3.  0(/i(n)  +  /2(n))  =  0(max(Jx(n)J2(n))). 

We  first  show  that  if  g(n)  e  C?(/i(n)  +  f2(n))  then  g(n)  e  0(max(/,(n),  /2(n))). 
If  g(n)  e  0(/j(n)  +  /2(n)),  then  there  must  exist  and  Cj  such  that: 

V/t  ^  (g(n)  s  Cj(/i(n)  +  /2(n)) 

<  2cx-max(fx(n),f2(n))). 

So  let  k  =  kx  and  c  -  2cj.  Then, 

Vn  s  fc  (g(n)  2;  cmox(/,(n),/2(n))). 

Next  we  show  that  if  g(n)  e  0(m<w(/,(n),  /2(n)))  then  g(n)  e  C>(/,(n)  +  /2(n)). 
If  g(n)  e  0(max(Jx(n),f2(n))),  then  there  must  exist  kx  and  Cj  such  that: 

Vn  s  kx  (g(n)  s=  c,  max(f\(n),  f2(n)) 

*  ci  (/»(n)  +  /2(n))). 

So  let  k  -  k\  and  c  =  C\.  Then, 

Vn  >  fc  (g(n)  ^  c(/,(n)  +  /2(n))). 
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Fact  3:  Multiplication 

1.  0{f(n))  =  O(cof  (n)). 

We  first  show  that,  if  g(n)  g  C?(/(n)),  then  g(n)  e  O(cof(n)).  If  g(n)  e  0(f(n)), 
then  there  must  exist  kx  and  Ci  such  that: 

Vn  a  kt  ( g(n )  =s  Cif(n)). 

So  let  k  =  kx  and  c  =  c^/co-  (Thus  c\  =  c  c0.)  Then, 

Vn  >  fc  (g(n)  =:  ccn/’Cn)). 

Next  we  show  that,  if  g(n)  g  O(cof  («)),  then  g(n)  e  0(/(n)).  If  g(n)  e  0(co/(n)), 
then  there  must  exist  kx  and  cx  such  that: 

Vn  ^  *i  (g(n)  ^  CiCo/(n)). 

So  let  k  =  kx  and  c  =  CiCq.  Then, 

Vn  s  k  (g(n)  ss  c/(n». 

2.  If  /j(n)  e  C?(gi(n))  and  /2(n)  e  0(g2(n))  then  /i(n)/2(n)  g  0(gi(n)g2(n)). 

If  /i(n)  g  C?(gi(n))  and  /2(n)  g  0(g2(n)),  then  there  must  exist  ku  cb  k2  and  c2 
such  that: 

Vn  a  (/i(n)  ^  cx  gi(n»,  and 
Vn  >  k2  (/2(n)  ^  c2g2(n)). 

Thus:  Vn  2:  ma*(fci,  *2)  ( /i(n)/2(n )  <  cx  c2  gi(n)  g2(n». 

So  let  /c  =  max{kx ,  fc2)  and  c  =  Cj  c2.  Then, 

Vn  2:  (/i(n)/2(n)  <=  cgi(n)g2(n». 

Fact  4:  Polynomials 

1.  If  a  rS,  b  then  0(nfl)  £  0(nb ). 

If  g(n)  e  0(nfl),  then  there  must  exist  fc,  and  Ct  such  that: 

Vn  ^  (g(n)  <  cxna 

£  Ci**6)  (since  (a  s  b)  -*  (na  s  n*)). 

So  let  fc  =  fci  and  c  =  cj.  Then: 

Vn  >  A:  (g(n)  s  cn *). 

2.  If/(n)  =  Cyn'  +  Cy-X'1  +...Cln  +  %  then  /(n) g 0(n>). 

Vn  ^  1  (cjrt*  +  Cj-Xni  1  +...c!n  +  Cq  ^  c;n*  +  Cj.xnf  +...cxn!  +  Con* 

s  (cy  +  Cy_ |  + ...  Ci  +  Co)  n;). 
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So  let  A:  =  1  andc  =  (c;  +  c;_i  +...c  +  c0).Then: 

Vn  2:  k  (/(n)  s  cn1). 

Fact  5:  Logarithms 

1.  For  a  and  b  >  l.OOogg/i)  =  0(logb/i). 

Without  loss  of  generality,  it  suffices  to  show  that  C^log*  n)  CC^Iog^n).  If 
g(n)  e  0(log„  n),  then  there  must  exist  A:,  and  C|  such  that: 

V/i  >  Ar|  (g(n)  c,  log^  n). 

Note  that  loga  n  =  log^  b  log*  n.  So  let  k  =  kx  and  c  =  cx  log,,  b.  Then, 

Vn  2  k  (g(n)  ^  c  log*  n). 

2.  If  0  <  a  <  b  and  c  >  1  then  0(n°)  C  0(na  logc  n)  C  0(nb). 

First  we  show  that  0(n")  C  0(n°  log,  n).  For  any  n  2  c,  logc «  i  1,  so  n' s 
n"  log,.  n.  If  g(n)  e  0(na),  then  there  must  exist  kx  and  cx  such  that: 

Vn  2=  Ac,  (g(n)  s  cn"). 

So  let  Ac  =  max(k\,c)  and  Cq  =  Cj.Then, 

V«  2  Ac  (g(«)  ^  Cgrt"  log,  n). 

Next  we  show  that  0(n°  log,,  n)  C  0(nft).  First,  notice  that,  for  p  >  0  and  n  2  1, 
we  have, 

n  n 

log,n  =  J x~{  dx  ^  Jx~l+Pdx  =  —  (np  -  1)  <  —  np. 
ii  P  P 

In  particular,  for  p  =  b  -  a,  we  have: 

log,n  <  — - — nb~a. 
b  —  a 

If  g(n)  e  O(n0  log,  n),  then  there  must  exist  Ac,  and  c,  such  that: 

Vn  ^  Ac,  (g(n)  s  c,n"  logc  n ). 

So,  for  all  n  2  max(  1,  Ac,),  we  have  that: 

g(n)  s  cn"  logc  n  =  cna  log,  e  log,  n  <  ~^-n°nb~a  s  — 08c  e  nb 

b  —  a  b  —  a 

C  lofi,-  6 

So  let  Ac  =  max(\,kx)  and  c0  =  — - .  Then, 

b  —  a 


Vn*zk  (g(n)  s  c0nh). 
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Fact  6:  Exponentials  (including  the  fact  that  exponentials 
dominate  polynomials) 

L  If  1  <  a  ft  then  0(an)  C  0(f)"). 

If  g(#t)  e  0(an),  then  there  must  exist  kx  and  ^  such  that: 

V#t  ^  *i  ( g(n )  ^  Cia" 

s  Clbn)  (since  (n  £  ft)  -*  (an  s  ft")). 

So  let  k  =  k\  and  c  =  c\.  Then, 

V#i  s  k  (g(n)  s  cbn). 

2.  If  a  *  0  and  b  >  1  then  0(na)  £  0(ftw). 

If  a  =  0,  then  we  have  that  0(1)  C  0(fcn),  which  is  trivially  true.  We  now  consider 
the  case  in  which  a  >  0.  First  notice  that,  if  p  a  1  and  n  ^  p,  then. 


log,#!  =  J ~dx  -  J ~^x  +  J^x 
l  t  } 

It 

s  log,  p  +  /I* 
<  log,  p  + 


p 

n-  p 


n 


<  log,  p  + 

t  a  \  1  log,b 

If,  in  particular,  p  =  maxi- - -,  1  ,  then  —  ^ - ,  p^l  and: 

log,  ft  pa 

log,#i  =s  log,p  +  ^-^#1 
a 


a  log,  n  s  a  log,  p  +  log,  ft  •  n. 


And: 


n«  =  s  ga-iofcp+iofcb  ). 

s  p°  •  ft". 

If  g(#i)  e  0(n*),  then  there  must  exist  and  cx  such  that: 

-  *1  (g(«)  S  Cj#I®). 

So,  again  letting  p  =  #mw(^-£,  l),  let  k  =  mnx(fe1,  p)  and  c  =  ^p*.  Then, 
Vnsl:  (g(n)  s  eft"). 
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3.  If  f{n)  =  cj+i 2"  4  cpi  4-  +  4  Co,  then  f(u)eO( 2"). 

From  Fact  4.2.  we  have  that  c/»;  4  4  c„  e  O(n').  From  Fact  6.2, 

we  have  that  n>  e  0(2").  So,  using  the  transitivity  property  that  we  prove  in  8,  we 
have  that  Cjn1  4  cy-j/i7- 1  4,..C|/i  4  c„eC>(2'').  So  there  must  exist  A )  and  A2 
such  that: 


V/i  s  A: ,  4  c/-iw/'1  4 , . . c | h  4  c’o  ^  A2 2" 

c/+, 2"  4  tyi*  4  c^jw7'1  4...C|n  4  c„  s  c’y,|2"  4  A22" 

So  let  k  =  k}  and  c  =  c/+1  4  A2.Then, 

V/i  s  k  (/i(«)  s  c2"). 


Fact  7:  Factorial  dominates  exponentials:  If  a  a  7  then  0(an)  C  0(n\) 

First  notice  that,  if  a  >  1,  then: 


U-1 


a 

—  • 
A 


w!. 


If  g(/?)  e  0(a"),  then  there  must  exist  k ,  and  c,  such  that: 


25  *i  ($(«)  ^  c, «"). 
llT1  o 

So  let  A  =  A,  and  c  =  C|  .  Then, 


V/t  ^  k  (g(/t)  s  cvi! ). 


Fact  8:  Transitivity:  If  f(n)  e  0(f1(n))  and  f^n)  e 
then  f(n)  e  Ofofn)) 

If /(/j)  e  C?(/,(/»))  and  /,(/i)e  0(/2(/i)),  then  there  must  exist  A,,  c,.  A,  and  cs  such  that: 

Vn  ^  A,  (/(«)  <  c\f\(n)). 

Vn  s:  *a  (/,(»)  Sfi/ifn)). 
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So  Icl  k  =  nuix(k\.  k2)  and  c  =  cxc2.  Then, 

V/j  ^  k  (/(/»)  ^  cf2(n)). 


p  i. 2  Facts  about  rr 

We  will  prove  separately  the  two  claims  made  in  the  theorem.  The  basis  for  these 
proofs  is  the  definition  of  the  relation  <r.  f(n)  e  a(g(n ))  iff,  for  every  positive  c,  there 
exists  a  positive  integer  k  such  that: 

Vn  2:  k  (/(«)  <  cg(n)). 

Let  /and  g  be  functions  from  the  natural  numbers  to  the  positive  reals. Then, 

1.  /(n)*<r(/(«)): 

Let  c  =  l.Then  there  exists  no  k  such  that  V/i  ^  k  ( f(n )  <  cf(n)). 

If  g(n)  e  tr(f(n))  then,  for  every  positive  C|,  there  exists  a  kt  such  that:  Vn  ^ 
k  (g(n)  <  t/(n)).  To  show  that  g(n)  eO(/(/i)),  it  suffices  to  find  a  single  c  and 
k  that  satisfy  the  definition  of  O.  Let  c  =  1  and  let  k  be  the  k  ,  that  must  exist 
if  ti  =  1. 

p  2  The  Linear  Speedup  Theorem 

In  Section  27.5  we  introduced  the  theory  of  asymptotic  dominance  so  that  we  could  de¬ 
scribe  the  time  and  space  requirements  of  Turing  machines  by  the  rate  at  which  those 
requirements  grow,  rather  than  by  some  more  exact  measure  of  them.  One  conse¬ 
quence  of  this  approach  is  that  constant  factors  get  ignored.  This  makes  sense  for  two 
reasons.  The  first  is  that,  in  most  of  the  problems  we  want  to  consider,  such  factors  are 
dominated  by  much  faster  growing  ones,  so  they  have  little  impact  on  the  size  of  the 
problems  that  we  can  reasonably  solve. 

But  a  second  reason  is  the  one  that  we  will  focus  on  here:  All  but  the  most  effi¬ 
cient  Turing  machines  can  be  sped  up  by  any  constant  factor.  The  idea  behind  this 
claim  is  simple.  At  each  step  of  its  operation,  a  Tbring  machine  visits  one  tape  square. 
If  we  can  compress  the  contents  of  the  tape  so  that  they  fit  on  fewer  squares,  we  can 
reduce  the  number  of  steps  that  a  Turing  machine  must  execute  to  process  the  tape. 
Compressing  a  tape  is  easy.  Wc  simply  increase  the  size  of  the  tape  alphabet.  That 
enables  us  to  encode  a  chunk  of  squares  from  the  original  tape  into  a  single  square 
of  the  new  one. 


EXAMPLE  F.1  Encoding  Multiple  Tape  Squares  as  a  Single  Square 

Let  M  be  a  Turing  machine  whose  tape  alphabet  is  {0, 1,  □ }.  We  can  build  a  new 
Turing  machine  M'  that  uses  the  tape  alphabet  {A-Z,a-z,0-9.a-a>}.  Using  the  new 
alphabet,  wc  can  encode  four  squares  of  tape  as  a  single  square  on  the  tape 
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M'  will  use.  Initially,  we'll  include  in  the  encoding  at  least  one  blank  on  either  side 
of  the  input  (plus  more  on  the  right  for  padding  if  necessary).  If  M  ever  moves  off 
its  input,  new  squares  can  be  encoded  as  necessary.  So,  for  example: 

If  the  tape  of  M  is:  ....  □LKJOOOOOOOOOOOIOOOOKOJ  ... 

>  t  t  -  A  - 

Then  the  tape  of  M'  might  be:  C  8  8  f  tt 

If  we  design  the  transitions  of  M'  appropriately,  it  will  be  able  to  do  in  one  step 
what  it  takes  M  four  steps  to  do. 


The  compression  idea  that  we  have  just  described  is  the  basis  of  the  Linear  Speedup 
Theorem  that  we  are  about  to  present.  Before  we  go  into  the  details  of  the  theorem  and 
its  proof,  one  caveat  is  in  order.  While  this  theorem  is  of  some  theoretical  interest,  its  ap¬ 
plication  to  real  computers  is  limited.  Real  computers  have  a  fixed  size  alphabet  (gener¬ 
ally  consisting  of  two  binary  symbols).  So  the  sort  of  compression  that  we  are  using  here 
cannot  be  applied.  However,  it  is  worth  noting  that  other  compression  algorithms  (that 
exploit  patterns  in  particular  kinds  of  input  strings  and  are  thus  able  to  reduce  the  num¬ 
ber  of  bits  required)  are  routinely  used  in  many  kinds  of  real  applications. 


THEOREM  F.1  The  Linear  Speedup  Theorem 

Theorem:  Let  M  be  a  A-tape  Turing  machine  where  k  >  land  timercq(M)  =  /(«). 
Given  any  constant  c  >  0,  there  exists  another  A-tape  Turing  machine  M’  such 
that  L(M)  =  L(M')  and 


limereq{M') 


fin)' 

c 


+  2n  +  2*  f  6cl. 


Notice  that  c  is  a  factor  by  which  we  reduce  the  time  a  compulation  requires. 
So,  if  we  want  to  say  that  the  new  program  lakes  half  as  long,  we  set  c  to  2.  In 
some  statements  of  the  theorem,  c  is  a  multiplicative  factor.  Using  those  versions, 
we'd  set  c  to  1 2  in  that  case.  The  two  formulations  are  equivalent. 

Proof:  We  prove  the  claim  by  describing  the  operation  of  M\  M'  will  begin  by 
making  one  pass  across  its  input  tape  to  encode  it  as  described  in  Example 
F.l.  It  will  store  the  encoded  string  on  tape  2  and  blank  out  tape  1.  During  the 
rest  of  its  operation,  it  will  use  tape  1  as  M  would  have  used  tape  2  and  vice 
versa.  The  number  of  symbols  to  be  collapsed  into  one  at  this  step  is 
determined  by  c.  the  speedup  that  is  desired.  We'll  call  this  collapsing  factor  m 
and  set  it  to  \  6c  1. 

Next  M'  simulates  the  execution  of  AT  The  idea  here  is  that,  since  M'  encodes  m 
symbols  as  one,  it  can  process  m  symbols  as  one.  Unfortunately,  on  any  one  taDe 
the  m  symbols  that  M'  needs  may  he  spread  across  two  of  the  new  encoded  taDe 
squares:  They  may  fall  on  the  current  square  plus  the  one  to  its  right  or  they  may 


F.2  The  Linear  Speedup  Theorem  877 


fall  on  the  current  square  plus  the  one  to  ils  left.  So  M'  must  make  one  move  to  the 
left,  then  one  back,  and  then  one  more  to  the  right  and  then  back  before  it  can  be 
sure  that  it  has  all  the  information  it  needs  to  make  the  move  that  simulates  m 
moves  of  M.  We  illustrate  this  with  the  following  example.  Let  k  =  2  and  m  -  5. 
Each  tape  square  actually  contains  a  single  symbol  that  encodes  five  original  sym¬ 
bols,  but  we’ve  shown  them  here  with  the  original  sequences  so  that  it  is  possible  to 
see  how  the  simulation  works.  Suppose  that,  at  some  point  in  its  computation,  M 
has  entered  some  state  q  and  its  tapes  contain  the  following  fragments: 


Tape  l: 


Tape  2: 


□ 

0  0  10  0 

0  110  0 

1110  1 

1  0  0  0  0 

0  0  0  1  0 

•  •  • 

t 

. . . 

0  0  111 

110  0  0 

0  0  111 

1110  1 

110  11 

. . , 

T 


If  the  next  five  moves  of  M  move  the  read/write  head  on  tape  1  to  the  right 
five  times  and  they  move  the  read/write  head  on  tape  2  to  the  left  five  times.  M' 
will  need  to  examine  both  one  encoded  square  to  the  left  and  one  encoded 
square  to  the  right  before  it  will  have  enough  information  to  simulate  all  five  of 
those  moves. 

So  M'  simulates  M  by  doing  the  following: 

1.  Move  one  square  to  the  left  on  each  of  the  tapes  and  record  in  the  state  the 
encoded  symbol  it  finds  on  each  of  the  k  tapes. 

2.  Move  one  square  back  to  the  right  on  each  tape  and  record  in  the  state  the 
encoded  symbol  it  finds  on  each  of  the  k  tapes. 

3.  Move  one  more  square  to  the  right  on  each  tape  and  record  in  the  slate  the 
encoded  symbol  it  finds  on  each  of  the  k  tapes. 

4.  Move  one  square  back  to  the  left  on  each  tape. 

At  this  point,  the  read/wrile  heads  of  M'  are  back  where  they  started  and 
the  state  of  M  includes  the  following  vector  of  information  that  captures 
the  current  slate  of  M: 

(</*  M’s  state. 

L|,C„  /?!,  t,.  The  relevant  contents  of  tape  1: 

L\  is  the  encoded  square  to  the  left  of  the  one 

that  contains  M's  simulated  read/write  head. 
C)  is  the  encoded  square  that  contains  M's  simu¬ 
lated  read/write  head. 

R\  is  the  encoded  square  to  the  right  of  the  one 
that  contains  M's  simulated  read/write  head. 
f\  (an  integer  between  l  and  m)  is  the  position 
within  C|  of  M’s  simulated  read/write  head. 
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L2 ,  C2,  R2,  t2.  Tape  2:  similarly 

Lk,  Q,  Rk .  tk)  Tape  k:  similarly 

5.  Using  this  information,  make  one  move  that  alters  the  C  squares  as  neces¬ 
sary  and  moves  each  read/write  head  as  required  to  simulate  M.  Also  up¬ 
date  M' s  state. 

6.  Make  one  more  move  if  necessary.  If.  on  some  tape.  M‘ s  simulated  read/write 
head  moved  off  the  C  square,  it  will  be  necessary  to  make  this  second  move  in 
order  to  alter  the  contents  of  either  the  L  or  R  square  to  match  what  M  would 
have  done.  But  note  that,  on  any  given  tape,  it  will  only  be  necessary  to  work 
with  the  current  square  plus  one  to  the  left  or  the  current  square  plus  one  to 
the  right.  So  two  moves  suffice  to  make  all  necessary  changes  to  the  tapes. 

The  first  phase,  encoding  the  input  tape,  requires  that  M  make  one  complete 
pass  through  the  input  and  then  move  back  to  the  left.  It  may  have  to  use  up  to  m 
padding  blanks.  So,  in  the  worst  case,  on  an  input  of  length  n  this  phase  requires 
2 (n  +  in)  steps.  The  second  phase,  simulating  M ,  requires  at  most  six  steps  for 
every  m  steps  that  M  would  have  executed.  So,  if  tiniercq(M)  =  /(«),  then, 

timereq(M')  £ 

Since  m  =  f  6c  1.  we  then  have, 

timereq{  M')  ^  +  +  2*  f  6c  1. 


+  2  (n  +  m). 


APPENDICES  G-Q: 

Applications 


In  appendices  G  through  Q,  we  describe  applications  of  the  techniques  that  have 
been  covered  throughout  the  book.  Most  of  the  discussion  is  organized  around  a  col¬ 
lection  of  key  application  areas  that  make  use  of  more  than  one  of  the  ideas  that  we 
have  discussed.  We  will  consider  all  in  the  following  list,  although  we  will  barely  scratch 
the  surface  of  each. 

•  Programming  languages:  syntax  and  compilers 
•  Functional  programming 

•  Tools  for  programming  and  software  engineering,  including  techniques  for  verify¬ 
ing  the  correctness  of  programs  and  of  hardware  designs 

•  Network  protocols,  network  modeling,  and  the  Semantic  Web 
•  Computer  system  security,  cryptography,  hackers  and  viruses 
•  Computational  biology 
•  Natural  language  processing 
•  Artificial  intelligence  and  computational  reasoning 
•  Music 

•  Classic  games  and  puzzles 
•  Interactive  video  games 


Then  we  will  look  at  three  of  the  specific  tools  that  we  have  introduced.  We  will 
briefly  survey  some  of  their  applications  that  lie  outside  the  particular  application 
areas  that  we  will  already  have  covered.  The  three  tools  are 

•  regular  expressions, 

•  finite  state  machines  and  transducers,  and 

•  grammars. 


APPENDIX  G 


APPLICATIONS:  PROGRAMMING 
LANGUAGES  AND  COMPILERS 


The  ideas  lhat  we  have  discussed  throughout  this  book  form  the  foundation  of 
modern  programming.  Programming  languages  are  typically  described  with  con¬ 
text-free  grammars.  Regular  expression  matchers  are  built  into  many  modem 
programming  environments.  Finite  state  transition  diagrams  enable  visual  programming. 

G.1  Defining  the  Syntax  of  Programming  Languages 

Most  programming  languages  are  mostly  context-free. There  are  some  properties,  such 
as  type  constraints,  that  cannot  usually  be  described  within  the  context-free  frame¬ 
work.  We  will  consider  those  briefly  in  Section  G.2.  But  context-free  grammars  provide 
the  basis  for  defining  most  of  the  syntax  of  most  programming  languages. 

G.1.1  BNF 

It  became  clear  early  on  in  the  history  of  programming  language  development  that 
designing  a  language  was  not  enough.  It  was  also  necessary  to  produce  an  unambiguous 
language  specification.  Without  such  a  specification,  compiler  writers  were  unsure 
what  to  write  and  users  didn't  know  what  code  would  compile.  The  inspiration  for  a 
solution  to  this  problem  came  from  the  idea  of  a  rewrite  or  production  system  as 
described  years  earlier  by  Emil  Post.  (See  Section  18.2.4.)  In  1959.  John  Backus  con¬ 
fronted  the  specification  problem  as  he  tried  to  write  a  description  of  the  new  language 
ALGOL  58.  Backus  later  wrote  (Backus  1980). "As  soon  as  the  need  for  precise 
description  was  noted,  it  became  obvious  that  Post's  productions  were  well-suited  for 
that  purpose.  I  hastily  adapted  them  for  use  in  describing  the  syntax  of  1AL  [Algol 
581."  The  notation  that  he  designed  was  modified  slightly  in  collaboration  with  Peter 
Naur  and  used  in  the  definition,  two  years  later,  of  ALGOL  60. The  ALGOL  60  notation 
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became  known  as  BNF  °,  for  Backus  Naur  form  or  Backus  Normal  form.  For  the  defin¬ 
itive  specification  of  ALGOL  60.  using  BNF,  see  [Naur  1%3].  Just  as  the  ALGOL  60  lan¬ 
guage  influenced  the  design  of  generations  of  procedural  programming  languages,  BNF 
has  served  as  the  basis  for  the  description  of  those  new  languages,  as  well  as  others. 

The  BNF  language  that  Backus  and  Naur  used  exploited  these  special  symbols: 

•  ::=  corresponds  to  — * , 

•  |  means  or,  and 

•  <  >  surround  the  names  of  the  nonterminal  symbols. 

EXAMPLE  G.1  Standard  BNF 

Our  term/factor  grammar  for  arithmetic  expressions  would  be  written  as  follows 
in  the  original  BNF  language: 

<E>  ::=  <E>  +  <T>  |  <T> 

<T>  <T>  *  <F>  |  <F> 

<F>  : :=  id  |  (<E>) 

While  it  seems  obvious  to  us  now  that  formal  specifications  of  syntax  are  impor¬ 
tant  and  BNF  seems  a  natural  way  to  provide  such  specifications,  the  invention  of 
BNF  was  an  important  milestone  in  the  development  of  computing.  John  Backus  re¬ 
ceived  the  1977  Turing  Award  for  "profound,  influential,  and  lasting  contributions  to 
the  design  of  practical  high-level  programming  systems,  notably  through  his  work  on 
FORTRAN,  and  for  seminal  publication  of  formal  procedures  for  the  specification 
of  programming  languages.”  Peter  Naur  received  the  2003  Turing  Award  "For  funda¬ 
mental  contributions  to  programming  language  design  and  the  definition  of  Algol  60, 
to  compiler  design,  and  to  the  art  and  practice  of  computer  programming.” 

Since  its  introduction  in  1960,  BNF  has  become  the  standard  tool  for  describing  the 
context-free  part  of  the  syntax  of  programming  languages,  as  well  as  a  variety  of  other 
formal  languages:  query  languages,  markup  languages,  and  so  forth.  In  later  years,  it  has 
been  extended  both  to  make  better  use  of  the  larger  character  codes  that  are  now  in 
widespread  use  and  to  make  specifications  more  concise  and  easier  to  read.  For  exam¬ 
ple,  modern  versions  of  BNF 

•  often  use  — *  instead  of  ::=. 

•  provide  a  convenient  notation  for  indicating  optional  constituents.  One  approach  is 
to  use  the  subscript  opl.  Another  is  to  declare  square  brackets  to  be  metacharacters 
that  surround  optional  constituents. The  following  rules  illustrate  three  ways  to  say 
the  same  thing: 

S  —  T\s 
S->[T] 

•  may  include  many  of  the  features  of  regular  expressions,  which  are  convenient  for 
specifying  those  parts  of  a  language's  syntax  that  do  not  require  the  full  power  of 
the  context-free  formalism. 

These  various  dialects  are  called  Extended  BNF  or  EBNF  □. 
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EXAMPLE  G.2  EBNF 

In  standard  BNF,  we  could  write  the  following  rule  that  describes  the  syntax  of  an 
identifier  that  must  be  composed  of  an  initial  letter,  followed  by  zero  or  more  al¬ 
phanumeric  characters: 

<identifier>  <letter>  |  <letter>  <alphanumseq> 

<al phanumseq>  <alphanum>  |  <alphamm>  <al phanumseq> 

<alphanum>  <letter>  |  <digit> 

In  EBNF,  it  can  be  written  as: 

identifier  =  letter  (letter  |  digit)* 

Bui  note,  this  is  a  simple  example  that  illustrates  the  point.  In  any  practical  sys¬ 
tem,  the  parsing  of  tokens,  such  as  identifiers,  is  generally  handled  by  a  lexical  an¬ 
alyzer  and  not  by  the  context-free  parser. 


G.1.2  Railroad  Diagrams 

Context-free  grammars  fill  reference  books  on  every  programming  language  that  has 
been  created  since  BNF  was  used  to  define  ALGOL  h().  Sometimes  more  modem  def¬ 
initions  look  superficially  dilferent  from  BNF.  since  other  notations  have  been  devel¬ 
oped  over  the  years.  For  example,  railroad  diagrams  a  (also  called  syntax  diagrams  or 
railway  tracks)  are  graphical  renditions  of  the  rules  of  a  context-free  grammar.  Rail¬ 
road  diagrams  have  the  same  expressive  power  as  docs  BNF.  but  they  are  sometimes 
easier  to  read. 


EXAMPLE  G.3  A  Railroad  Diagram  for  a  Switch  Statement 

Consider  the  following  BNF  specification  for  a  switch  statement  like  the  one  in 
Java  (where  the  subscriptOPT  indicates  an  optional  constituent): 

<switch-statement>  : SWITCH  (<int-expression>  | 

<enum-type>}  )  {ccase-1 ist>} 
<case-list>  <case-body>  <default-clause>0PT 
<case-body>  ::=  <case-item>  |<case-item>  <case-body> 
<case-item>  ::«=  CASE  <value>  :  <stmt-1ist>  BREAKo,^ 
<default-clause>  DEFAULT  :  <stmt-list> 

We  assume  that  <stmt-1i  st>,  which  is  used  in  other  places  in  the  grammar  is 
defined  elsewhere. 
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Here’s  the  corresponding  railroad  diagram  (again  assuming  that  < stmt- 
list  >  is  defined  elsewhere): 


switch-stmt: 

I 

SWITCH  —  ( 


Tint -express kjh  ^ 
cnum-lype - , 

V^. 


>  i 


-  CASE  —  value  — :  —  slml-list  -y-  BREAK  "x 


T 


-  DEFAULT  - stmt-list 


Terminal  strings  are  shown  in  upper  case.  Nonterminals  are  shown  in  lower 
case.  To  generate  a  switch  statement,  we  follow  the  lines  and  arrows,  starting  from 
switch-stmt. The  word  SWITCH  appears  first,  followed  by  (.Then  one  of  the  two 
alternative  paths  is  chosen.  They  converge  and  then  the  symbols  )  and  {  appear. 
There  must  be  at  least  one  case  alternative,  but,  when  it  is  complete,  the  path  may 
return  for  more.  The  BREAK  command  is  optional.  So  is  the  DEFAULT  clause, 
in  both  cases  because  there  are  detours  around  them. 


2  Are  Programming  Languages  Context-Free? 

So  far.  we  have  considered  the  use  of  context-free  grammars  to  specify  the  syntax 
of  individual  statements.  They  are  also  used  to  specify  the  structure  of  entire  pro¬ 
grams.  However,  there  are  global  properties  of  programs  that  cannot  be  described 
within  the  context-free  framework.  Recall  that,  in  Section  13.3,  we  showed  that 
WcW  =  {wcw :  we  {a.b}*}  is  not  context-free.  The  structure  of  the  strings  in 
WcW  is  very  similar  to  the  declaration-use  pattern  that  is  common  in  typed  pro¬ 
gramming  languages. 


EXAMPLE  G.4  Why  Java  Isn't  Context  Free 
Here’s  a  syntactically  legal  Java  program: 
public  class  example 

{public  static  void  main  () 

{  char  todayistuesday; 
todayistuesday  =  ‘a’;}} 
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EXAMPLE  6.4  ( Continued ) 

Here's  a  siring  lhal  is  nol  a  syntactically  legal  Java  program: 

public  class  example 

{public  static  void  main  () 

{  char  todayiswednesday; 

todayistuesday  =  ‘a’;}} 

The  problem  with  the  second  program  is  that  the  variable  that  is  used  hasn’t 
been  declared.  Observe  the  relationship  between  the  strings  of  this  sort  that  are 
legal  and  the  language  WcW  =  [wcw  :ife(a,b}*}  by  substituting  ;  for  c  and 
the  variable  name  for  w. 


To  prove  that  Java  is  not  context-free,  let: 

J  =  {syntactically  legal  Java  programs}  fl 

<prelude>  string  a*b*;  a*b*  =  ‘a’;}}. 

We've  used  the  shorthand  <prelude>  for  some  particular  opening  string  that  will 
transform  the  remaining  fragment  into  a  legal  Java  program.  So  J  includes  a  set  of  Java 
programs  that  declare  a  single  variable  whose  name  is  in  a^'h*  and  then  do  a  single 
operation,  namely  assigning  to  that  variable  the  value  a’. 

By  Theorem  13.7.  if  Java  were  context  free  then  J  would  also  be  context-free  since  it 
would  be  the  intersection  of  a  context-free  language  with  a  regular  language.  But  we 
can  show  that  J  is  not  context-free  using  the  Pumping  Theorem.  Let: 

w  =  <preludc>  string  a*b* ;  a4b*  =  ‘a’:} } 

|  l  |2|3|4|5|6|  7  | 

If  either  v  or  y  contains  any  portions  of  regions  1.4.  or  7,  then  set  </  to  2. The  result¬ 
ing  string  violates  the  form  constraint  of  J.  If  either  /;  or  y  overlaps  the  boundary  be¬ 
tween  regions  2  and  3  or  regions  5  and  6.  then  set  q  to  2. The  resulting  string  violates 
the  form  constraint  of  J.  It  remains  to  consider  the  following  cases: 

•  (2,2),(2,3).(3,3),(5,5).(5.6).(6.6).(3,5):Set  q  to2.The  resulting  siring  ,v  will  have 
a  declaration  for  one  variable  and  a  use  of  another,  thus  violating  the  type  con¬ 
straints  of  Java.  So  s  e  { syntactically  legal  Java  programs }. 

•  (2.5).  (2,6),  (3. 6):  Violate  the  requirement  that  |iury|  s  k. 

There  is  no  way  to  carve  w  into  uvxyz  such  that  all  the  conditions  of  the  Pumping 
Theorem  are  met.  So  J  is  nol  context-free.  So  Java  is  not  context-free. 

Recall  that,  in  Exercise  13.3.  we  considered  another  aspect  of  type  checking:  guar¬ 
anteeing  that  each  invocation  of  a  declared  procedure  contains  the  same  number  of 
parameters  as  the  declaration.  We  saw  that  a  simple  language  that  exhibits  a  similar 
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properly  is  not  context  free.  Because  of  these  issues,  type  checking  typically  cannot  be 
done  with  a  context-free  parser. 


.3  Designing  Programming  Languages 
and  Their  Grammars 

So  far.  we  have  discussed  the  syntax  of  programming  languages  as  though  it  were  a  nat¬ 
ural  phenomenon  over  which  we  have  no  control.  In  Appendix  L,  we’ll  consider  Eng¬ 
lish;  it  is  such  a  phenomenon  and  we  have  no  control.  But  programming  languages  are 
artificial  things,  designed  by  people  to  serve  a  particular  purpose.  It  makes  sense  to  de¬ 
sign  them  so  that  they  have  the  properties  we  want.  Clearly  we  want  them  to  be  expres¬ 
sive.  easy  to  use.  and  hard  to  make  mistakes  in  (alternatively,  easy  to  check  for 
mistakes).  We  also  want  to  design  the  syntax  so  that; 

•  The  language  is  not  inherently  ambiguous.  If  this  is  true,  then  we  will  be  able  to  de¬ 
sign  an  unambiguous  grammar  that  will  generate  exactly  one  parse  tree  (and  thus 
one  interpretation)  for  each  string  in  the  language. 

•  The  language  can  be  parsed  efficiently  (i.e..  deterministically).  This  requirement 
imposes  a  stronger  constraint  than  does  the  need  to  avoid  ambiguity. 

•  The  syntax  is  straightforward.  We  want  to  be  able  to  write  a  grammar  that  serves  to 
document  the  syntax  in  a  way  that  is  readable  by  programmers. 

The  issue  of  ambiguity  is  particularly  important  and  it  is  enlightening  to  contrast 
English  with  useful  artificial  languages  in  this  regard.  For  example,  while  English  does 
not  allow  the  use  of  parentheses  to  force  a  particular  parse  structure,  most  program¬ 
ming  languages  do.  While  English  does  not  exploit  rules  like  operator  precedence,  and 
instead  allows  great  flexibility  in  organizing  sentences,  most  programming  languages 
do  exploit  such  rules  and  are  defined  by  grammars  that  force  a  single  interpretation  in 
all  cases.  So.  while  many  English  sentences  are  highly  ambiguous,  most  programming 
language  statements  are  not.  Contrast; 

The  boy  and  the  girl  with  the  red  wagon  bought  a  pencil  and  a  book  with  a 
floppy  cover, 

with 

17  -I-  12*(4*8)*4  -I-  7. 

The  English  sentence  is  ambiguous  given  any  reasonable  grammar  of  English.  The 
arithmetic  expression  is  ambiguous  given  some  arithmetic  expression  grammars.  But, 
as  we  saw  in  Example  1 1.19.  it  is  straightforward  to  design  an  unambiguous  grammar 
for  arithmetic  expressions. 

Unfortunately,  some  convenient  programming  language  constructs  present  chal¬ 
lenges  to  t  re  esign  o  unambiguous  grammars.  The  most  common  is  the  i  f  statement 
that  allows  an  optional  el  se  clause.  We  discussed  this  problem,  generally  known  as  the 
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dangling  else  problem,  in  Example  1 1.20.  Recall  the  following  statement  that  we  pre¬ 
sented  there: 

if  condj  then  if  cond2  then  st2  else  st2 

The  problem  is  that,  if  we  use  a  straightforward  grammar  that  makes  else  clauses 
optional,  then  this  statement  has  two  parses  (and  thus  two  meanings): 

if  cond|  then  [if  cond2  then  si,  else  st2] 

if  cond,  then  [if  cond2  then  si,]  else  st2 

The  designers  of  any  programming  language  that  has  this  construct  must  solve  the 
problem  in  one  of  two  ways: 

•  Rely  on  delimiters  to  disambiguate  nested  i  f  statements.  Languages  such  as  Algol 
68,  Modula-2,  Ada.  Lisp,  and  Scheme  take  this  approach.  For  example,  in  Algol  68, 
one  would  write: 

if  i  =  0  then 

if  j  -  0  then 

x  :=  0 

f  i  /*  In  Algol  68.  each  delimiter  x  had  a  matching  close 

/*  delimiter  xH.  So  if/fi . 

else  x  1 
fi 

Or.  in  Scheme,  one  would  write: 

(if  (=  i  0)  (if  (=  j  0)  (set!  x  0))  (set!  x  1)) 

It  is  clear,  in  both  these  cases,  that  the  single  else  clause  (which  sets  x  to  1)  goes 
with  the  first  i  f. 

•  Dispense  with  delimiters  and  substitute  an  arbitrary  decision  that  can  be  encoded 
in  the  grammar.  Languages  such  as  C  and  Java  take  this  approach,  which  we  illus¬ 
trated  with  a  fragment  of  a  Java  grammar  in  Example  1 1.20.  Of  course,  the  main 
drawback  to  this  approach  is  that  programmers  may  not  always  be  aware  which 
arbitrary  decision  will  be  made;  they  may  write  their  code  assuming  that  something 
different  will  happen.  So  this  approach  can  lead  to  programmer  errors,  which  can 
sometimes  be  caught  by  audit  rule  checkers  Q. 

So  there  are  cases  like  the  dangling  else  problem  that  are  known  to  create  ambigu¬ 
ity.  And  there  are  examples  of  grammars  that  can  be  shown  (as  we  did  in  Example 
11.18)  to  be  unambiguous.  Unfortunately,  the  undecidabilily  results  of  Section  22.5 
make  it  clear  that  there  can  exist  no  general  tools  that  can  tell  the  difference.  So,  in  par¬ 
ticular.  there  exist  no  general  tools  that  can: 

•  decide  whether  a  proposed  language  is  inherently  ambiguous,  or 

•  decide  whether  a  proposed  grammar  is  ambiguous. 
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G.4  Compilers  for  Programming  Languages 

The  job  of  a  compiler  can  be  broken  down  into  the  following  pieces: 

•  lexical  analysis. 

•  syntactic  analysis, 

•  code  generation  and  optimization,  and 

•  error  checking,  which  must  be  done  at  each  step  of  the  process. 

Both  lexical  analysis  and  syntactic  analysis  are  driven  primarily  by  the  theory  that 
has  been  presented  in  this  book,  which  also  tells  us  something  about  what  kinds  of 
error  checking  are  possible.  In  addition,  the  computability  results  we  have  presented 
have  implications  for  our  ability  to  design  effective  optimizers. 

^  4,1  Lexical  Analysis 

The  job  of  a  lexical  analyzer  is  to  transform  a  string  of  input  characters  into  a  string  of 
tokens  (typically  corresponding  to  the  smallest  meaningful  units  in  the  language). The 
character  patterns  that  correspond  to  the  allowable  tokens  are  generally  described 
with  regular  expressions.  See  Section  15.1  for  a  discussion  of  how  lexical  analysis  is 
done  and  the  tools  that  are  available  for  building  lexical  analyzers. 


(3  4.2  Syntactic  Analysis 

The  job  of  a  syntactic  analyzer  is  twofold: 

•  to  transform  a  sequence  of  tokens  into  a  parse  tree  that  represents  the  structure  of 
the  input  and  that  can  be  used  as  the  basis  for  generating  code,  and 

•  to  check  for  errors. 


We've  seen  that  most  of  the  syntactic  structure  of  most  programming  languages  can 
be  described  with  a  context-free  grammar.  But  there  are  features,  for  example  type 
constraints,  that  cannot.  So  one  approach  might  be  to  move  outward  in  the  language 
hierarchy.  For  example,  we  might  use  a  context-sensitive  grammar  instead  of  a  context- 
free  one.  Thai  would  solve  many  of  the  problems,  but  it  would  introduce  a  new  one. 
Recall,  from  Chapters  24  and  29.  that  the  best  known  algorithm  for  examining  a  string 
and  deciding  whether  or  not  it  could  be  generated  by  an  arbitrary  context-sensitive 
grammar  takes  lime  that  is  exponential  in  the  length  of  the  input  string.  Some  programs 
are  very  long  and  compilers  need  to  be  fast,  so  that  is  not  an  acceptable  solution.  As 
a  result,  the  way  practical  syntactic  analyzers  work  is  to: 


exploit  a  context-free  grammar  that  describes  those  features  that  it  can. 

use  a  deterministic  parser  such  as  one  of  the  ones  described  in  Sections  15.23  and 


augment  the  parser  with  specific  code,  for 
noncontext-frec  features  of  the  language. 


example  a  symbol  table,  to  handle  the 
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G.4.3  Optimization 

Optimizing  compilers  play  an  important  role  in  the  development  of  modern  software. 
But,  unfortunately,  the  undecidahility  results  that  we  have  discussed,  particularly  the 
ones  we  summarized  in  Section  21.5.  describe  some  clear  limits  on  what  these  compil¬ 
ers  can  do. 

For  example,  consider  the  problem  of  dead  code  elimination.  Programmers  do  not 
intentionally  write  code  that  can  never  be  reached  at  run  lime.  But  programs  that  have 
been  around  for  a  while  tend  to  accrete  such  dead  code  as  a  result  of  changes  that  af¬ 
fect  overall  control  flow.  Is  it  possible  to  build  a  compiler  that  checks  for  dead  code  and 
simply  eliminates  it?  The  answer  is  no.  and  it  follows  directly  from  Theorem  21.13, 
which  tells  us  that  the  language  \<M,  r/>  :  Turing  machine  M  reaches  c/  on  some 
input }  is  not  in  D. 


G.4.4  Compile-Time  Error  Checking 

It  is  safer  and  substantially  more  efficient  to  detect  errors  at  compile  time,  rather  than 
waiting  until  run  time  to  do  so.  The  theory  that  has  been  presented  in  this  book  pro¬ 
vides  substantial  insight  into  ways  of  doing  this.  It  also  defines  limits  on  what  any  sort 
of  compile-time  error-checking  process  can  do. 

Errors  that  Can  be  Caught  by  a  Context-Free  Parser 

Some  errors  are  easy  to  detect.  They  can  be  caught  by  a  context-free  parser  because 
they  result  in  strings  that  are  outside  the  language  that  is  generated  by  the  grammar 
on  which  the  parser  operates.  For  example,  given  the  expression  grammar  that  we 
have  used  throughout  this  book,  the  expressions  id  id  and  id  +  +  are  syntactically 
ill-formed. 

Ill-formed  strings  present  a  challenge  to  parser  designers,  however.  It  is  generally  un¬ 
satisfactory  to  find  the  first  error  in  a  program. stop,  and  report  that  the  parser  failed.  In¬ 
stead  the  parser  should  try  to  find  a  point  (perhaps  a  statement  boundary),  from  which 
it  can  partially  start  over.  Then  it  can  continue  reading  the  rest  of  the  program  and 
checking  for  additional  errors.  To  start  over,  the  parser  needs  to  figure  out  how  much 
input  lo  skip  and  how  much  of  what  is  on  its  stack  should  he  popped  and  discarded.  See 
[  Aho.  Sethi,  and  Ullman  1988)  for  a  discussion  of  various  ways  of  doing  this. 

Error  Questions  that  are  Decidable  but  not  by  a  Context-Free  Parser 

As  we  have  already  discussed,  context-free  grammars  are  unable  to  capture  the  type 
constraints  imposed  by  most  programming  languages.  However,  all  of  the  following 
questions  are  decidable. 

•  Given  a  program  P.  are  all  the  variables  in  V  declared? 

•  Given  a  program  P .  are  all  the  variables  in  P  used  only  in  operations  that  are 
defined  for  their  type? 

•  Given  a  program  P.  do  all  the  function  calls  in  P  have  the  correct  number  of  arguments? 
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Undecidable  Error  Questions 

Unfortunately,  as  we  saw  in  Chapter  21  and,  in  particular  in  Section  21.5,  there  are 
other  questions  about  the  correctness  of  programs  that  are  not  decidable.  The  most 
basic  is: 

1.  Given  a  program  P ,  does  P  halt  on  all  inputs?  So  no  compiler  can  offer  to  find  all 
infinite  loops  in  programs. 

But  there  are  others.  For  example,  we  showed,  in  Theorem  21.13,  that  it  is  undecid¬ 
able  whether  a  program  reaches  some  particular  state  (place  in  the  code)  on  any  input. 
The  question  is  also  undecidable  if  we  ask  about  a  particular  input  or  about  all  inputs. 

So  all  of  the  following  questions  are  undecidable. 

2.  Given  a  program  P  and  a  variable  x,  is  x  always  initialized  before  it  is  used? 

3.  Given  a  program  P  and  a  file  /,  does  P  always  close  /  before  it  exits? 

4.  Given  a  program  P  and  a  section  of  code  s  within  P,  is  s  dead  code  (i.e.,  code  that 
can  never  be  reached)? 

Some  other  undecidable  questions  include: 

5.  Given  a  program  P  and  a  division  statement  with  denominator  x,  is  x  always 
nonzero  when  the  statement  is  executed? 

6.  Given  a  program  P  with  an  array  reference  of  the  form  «[/],  will  /,  at  the  time  of 
the  reference,  always  be  within  the  bounds  declared  for  the  array? 

7.  Given  a  program  P  and  a  database  of  objects  d ,  does  P  perform  the  function  /on 
all  elements  of  dl 

We  will  show  that  question  5  is  undecidable.  The  proofs  of  questions  2, 3, 6,  and  7  are 
left  as  exercises  in  Chapter  21. 

THEOREM  G.1  "Does  a  Program  Divide  by  Zero?"  is  Undecidable 

Theorem:  The  language  Li  =  {<M,  s>  :  s  is  the  statement  number  of  a  division 
statement  to  be  executed  by  Turing  machine  M  and,  whenever  M  executes  state¬ 
ment  s.  the  denominator  is  nonzero}  is  not  in  D. 

Proof:  We  show  H  <  L2  and  so  L2  is  not  in  D.  Define: 

R{<M,  w>)  = 

1.  Construct  the  description  <Af#>  of  a  new  Turing  machine  Af#(x)  that, 
on  input  x,  operates  as  follows: 

1.1.  Erase  the  tape. 

1.2.  Write  w  on  the  tape, 

1.3.  Run  M  on  w, 

1.4.  x  =  1/0. 

2.  Return  <M#,  1.4>). 
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{/?, -1}  is  a  reduction  from  H  to  L >  If  Oracle  exists  and  decides  Li.  then 
C  =  ->Orucle(R(<M,  w>))  decides  H.  R  and  can  be  implemented  as  Turing 
machines.  And  C  is  correct.  Note  that  there  arc  no  explicit  division  statements  in 
M  (since  division  is  not  a  primitive  Turing  machine  operation).  So  the  only  divi¬ 
sion  statement  in  Mi  is  in  step  1.4. Thus: 

•  If  < M,  tr>  e  H :  M  halls  on  w.  so  Mi  makes  it  to  step  1 .4.  where  it  attempts  to 
divide  by  0.  Oracle  rejects,  so  C  accepts. 

•  If  <M,  w>  H :  M  does  not  hall  on  n\  so  Mi  gets  stuck  in  step  1.3.  It  never 
executes  step  1.4.  so,  trivially,  it  is  true  that  on  all  attempts  the  denominator  is 
nonzero.  So  Oracle  accepts  and  C  rejects. 

But  no  machine  to  decide  H  can  exist,  so  neither  does  Oracle. 

While,  as  we  have  just  seen,  there  are  program  errors  that  cannot  be  guaranteed  to 
be  caught  and  there  are  program  properties  that  cannot  be  proven  to  hold,  there  are 
many  useful  situations  in  which  it  is  possible  to  prove  that  a  program  meets  some  or  all 
of  its  specifications.  We  return  to  this  topic  in  the  next  chapter,  where  we  will  discuss  a 
variety  of  tools  that  support  both  programming  and  software  engineering. 


G.5  Functional  Programming  and  the  Lambda  Calculus 

In  the  mi)\  Alonzo  Church.  Alan  Turing,  and  others  were  working  on  the  problem 
that  had  come  to  be  known  as  the  Entscheidungsproblem.  They  sought  an  answer  to 
the  question.  "Does  there  exist  an  algorithm  to  decide  whether  a  sentence  in  first-order 
logic  is  valid?"  They  all  realized  that  to  answer  the  question,  particularly  in  the  nega¬ 
tive.  they  needed  a  formal  definition  of  what  an  algorithm  was.  Turing's  proposal  most 
closely  matched  the  procedural  approach  to  computing  that  seemed  natural  to  most 
early  programmers. Thus  the  theory  that  we  have  been  discussing  is  based  primarily  on 
Turing  machines. 

But  Church’s  proposal,  the  lambda  calculus,  laid  the  groundwork  for  an  alternative 
approach  to  programming  that  has  had  an  important  influence  on  the  modern  pro¬ 
gramming  language  landscape.  In  this  approach,  called  functional  programming,  one 
defines  a  program  as  a  function  to  be  computed,  rather  than  as  a  sequence  of  specific 
operations  (possibly  with  side  effects)  to  be  performed. 

In  1%0.  John  McCarthy  published  a  paper  (McCarthy  l%()|  in  which  he  described 
Lisp,  a  language  that  was  directly  inspired  by  the  lambda  calculus.  Today  Lisp  □  is  the 
second  oldest  surviving  programming  language.  (The  oldest  is  Fortran.)  Lisp  remains 
the  programming  language  of  choice  for  many  kinds  of  symbolic  computing  applica¬ 
tions.  It  is  the  platform  that  supports  a  variety  of  production  tools,  including  Emacs  0, 
a  flexible  and  extensible  text  editor,  knowledge-based  systems  like  Scone  u  and  KM  0 
the  music  composition  tool  Common  Music  a,  and  a  user  extension  language  for  the 
popular  computer-aided  design  tool  AutoCAD*,  just  to  name  a  few.  It  also  inspired  the 
design  of  an  entire  family  of  functional  programming  languages,  including  modem  lan¬ 
guages  like  ML  and  Haskell  U. 
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The  easiest  way  to  begin  to  understand  Lisp  is  to  consider  some  examples,  but  one 
caveat  is  required.  Lisp  is  no  longer  a  single  language.  It  is  a  whole  language  family, 
each  member  of  which  has  a  different  syntax.  For  illustrative  purposes,  we  will  use  an 
easy-to-rcad  syntax  close  to  that  of  early  Lisp  systems;  the  various  modern  Lisp  di¬ 
alects  will  differ  in  some  details. 

We  begin  with  a  simple  expression. 

(LAMBDA  (X)  (TIMES  X  X)) 

This  expression  defines  a  function  of  one  argument  that  returns  the  square  of  its  argu¬ 
ment.  We  often  want  not  just  to  define  functions  but  to  give  them  names  by  which  they 
can  be  referred.  We  can  assign  our  function  the  name  SQUARE  by  writing  the  following. 

(DEFINE  (SQUARE  (LAMBDA  (X)  (TIMES  X  X)))) 

Since  that  syntax  is  clunky,  there  is  an  easier  one. 

(DEFUN  SQUARE  (X)  (TIMES  XX)) 

In  this  alternative  syntax,  it  is  no  longer  necessary  to  write  LAMBDA  explicitly.  DEFUN 
takes  three  arguments;  the  name  of  the  function,  a  list  of  the  names  of  the  function’s 
arguments,  and  the  body  of  the  function.  Named  functions  can  call  themselves  recur¬ 
sively.  So  we  can  write  the  following. 

(DEFUN  FACTORIAL  (X) 

(COND  ((EQUAL  X  1)  1) 

(T  (TIMES  X  (FACTORIAL  (SUB1  X)))))) 

Read  this  definition  as  follows;  Define  the  function  FACTORIAL  of  one  argument  X. 
To  compute  it,  evaluate  a  CONDitional  expression  with  two  branches.  If  X  equals  1, 
then  return  1 .  Otherwise  (written  as  T,  standing  for  True),  return  the  result  of  multiply¬ 
ing  X  times  (X— 1 )!  The  FACTORIAL  function  is  in  many  ways  a  toy;  it  could  easily  be  im¬ 
plemented  with  a  loop  in  most  programming  languages,  including  modem  dialects  of 
Lisp.  But  it  illustrates,  for  a  simple  case,  the  power  of  recursion.  We’ll  mention  less  triv¬ 
ial  applications  shortly. 

In  the  early  dialects  of  Lisp,  there  were  only  two  kinds  of  objects:  primitive  objects, 
called  atoms,  and  lists.  Numbers,  strings,  and  Booleans  were  all  atoms.  In  this  view,  any¬ 
thing  with  an  internal  structure  is  a  list.  A  list  is  written  as  a  sequence  of  objects  enclosed 
in  parentheses.  So,  for  example,  (A  B  C)  is  a  list  that  contains  three  atoms.  A  list  may 

contain  another  list  as  one  of  its  elements.  So,  for  example,  we  could  have  (A  b  (C  D)), 

which  is  a  list  of  three  elements,  the  last  of  which  is  a  list  of  two  elements. The  following 
list  corresponds  to  a  complete  binary  tree  with  a  root  node  labeled  A  and  a  height  of  3. 
The  first  clement  of  each  list  is  the  label  of  the  root  node  to  which  the  list  corresponds. 
The  next  two  elements  describe  the  node’s  two  subtrees. 

(A  (B  (D  E))(C  (F  G))) 

Notice  now  that  the  definition  we  wrote  for  the  FACTORIAL  function  is  a  list  (with 
sublists).  In  Lisp,  programs  are  represented  as  lists,  the  data  type  that  Lisp  is  best  at 
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manipulating,  as  we'll  see  below.  To  parse  a  program  is  easy:  Ihe  tree  structure  of  a  pro¬ 
gram  is  exactly  its  structure  as  a  set  of  nested  lists.  So  it  is  straighllorward  in  Lisp  for 
programs  to  manipulate  and  modify  other  programs.  Early  Lisp  programmers  took 
advantage  of  this  and  wrote  functions  that  explicitly  manipulated  the  code  of  other 
functions.  We’ll  see  one  example  of  this  below,  although  it  is  now  generally  regarded  as 
bad  software  practice.  But  the  fact  that  Lisp  allowed  programs  to  access  other  programs 
and  the  environments  in  which  they  were  executed  led  to  the  development,  within  Lisp, 
of  arguably  the  most  powerful  macro  facility  in  any  modern  programming  language. 

Modern  dialects  of  Lisp  have  evolved  substantially  more  sophisticated  data  typing 
systems  than  the  simple,  atoms  and  lists  model  that  McC arlhv  introduced.  But  the  no¬ 
tion  that  programs  (functions)  are  data  objects  remains  a  key  feature  of  the  language 
and  a  major  source  of  the  flexibility  that  gives  Lisp  its  power. 

In  the  Lisp  programming  environment,  computation  occurs  when  an  expression 
(an  atom  or  a  list)  is  evaluated.  Constants  (including  numbers. as  well  as  the  Boolean 
constants  T  and  F)  evaluate  to  themselves.  Variables  evaluate  to  their  values.  Lists  are 
evaluated  by  treating  the  first  element  as  a  function  and  the  remaining  elements  as  the 
arguments  to  which  the  function  should  be  applied.  So.  for  example,  we  might  write: 

(FACTORIAL  5) 

This  expression,  when  evaluated,  will  apply  the  FACTORIAL  function  to  the  value  5 
and  return  120.  Before  a  function  can  be  applied  to  its  arguments,  each  of  them  must  be 
evaluated. This  wasn’t  obvious  in  the  (FACTORIAL  5)  case  because  the  atom  5  evalu¬ 
ates  to  5.  But  if  the  variable  X  has  the  value  3,  then  (FACTORIAL  X)  will  return  6. 

Lists  can  be  written,  as  we  have  been  doing,  as  constants  that  are  specified  by  the 
programmer  and  they  can  be  read  as  input.  Or  they  can  be  constructed  within  a  pro¬ 
gram.  Lisp  provides  a  set  of  primitives  for  constructing  lists  and  for  taking  them  apart. 
The  function  LIST  takes  any  number  of  arguments  and  puts  them  together  to  make  a 
list.  So  we  could  write  the  following. 

(LIST  'A  (FACTORIAL  5)) 

When  evaluated,  that  expression  will  build  a  list  with  two  elements.  The  first  will  be 
the  symbol  A.  (Remember  that  arguments  are  evaluated  before  functions  arc  applied  to 
them. The  quote  mark  suppresses  that  evaluation.  If  we  had  omitted  the  quote  mark  in 
front  of  A.  A  would  have  been  treated  as  a  variable  and  evaluated.  Then  its  value,  what¬ 
ever  it  is.  would  have  become  the  first  element  of  the  new  list.)  The  second  clement  of 
the  list  will  be  the  result  of  evaluating  (FACTORIAL  5).  So  the  new  list  will  be  (A  120). 

The  function  CONS  (for  constructor)  adds  an  element  to  the  front  of  an  existing  list. 
So  we  could  write  the  following  expression,  which  will  return  the  list  (B  A  120). 

(CONS  *B  ’(A  120)) 

Lists  can  be  broken  apart  into  their  pieces  using  two  primitive  functions  whose 
names  arc  historical  accidents:  CAR  returns  the  first  clement  of  the  list  it  is  given.  CDR 
returns  its  input  list  with  the  first  element  removed.  So: 

(CAR  ’  (B  A  120))  evaluates  to  B. 

(CDR  1 (B  A  120))  evaluates  to  (A  120). 
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In  most  programming  environments,  the  semantics  of  the  language  are  implemented 
in  a  black  box  runtime  environment  that  cannot  be  accessed  at  the  program  level.  This 
isn't  true  in  Lisp.  Functions  don't  just  look  like  lists.  They  are  lists  and  operations  can 
be  performed  on  them.  The  most  important  operation  that  can  be  performed  on  func¬ 
tions  is  evaluation.  Lisp  provides  functions,  including  EVAL,  APPLY,  and  FUNCALL,  that 
can  be  invoked  from  within  any  Lisp  program.  These  functions  explicitly  evaluate 
expressions.  For  example.  EVAL  takes  a  single  argument  and  evaluates  it.  To  see  how  it 
works,  suppose  that  we  want  to  read  in  P,  a  polynomial  function  of  one  variable,  and 
then  apply  it  to  several  values.  Then  we  might  input  the  following  list,  corresponding  to 
the  polynomial  7.v2  —  2.v  +  3. 

(PLUS  (DIFFERENCE  (TIMES  7  (SQUARE  X))  CTIMES  2  X))  3) 

Note  that  functions  in  Lisp  are  written  in  prefix  notation.  (Modern  Lisps  use  the 
more  compact  symbols  +,  *,  and  -  though.)  Suppose  that  the  variable  P  now  has  that 
list  as  its  value  and  the  variable  X  has  the  value  4.  Then  the  following  expression  will  re¬ 
turn  the  value  107. 

(EVAL  P) 

Functions  can  also  be  passed  as  parameters  to  other  functions.  Suppose,  for  ex¬ 
ample,  that  we  want  to  write  a  program  that  lakes  two  inputs,  a  function  and  a  list. 
We  want  our  program  to  apply  the  function  to  each  element  of  the  list  and  return  a 
new  list  with  the  resulting  values.  For  example,  if  given  SQUARE  and  the  list  (4  3  7).  it 
should  return  (lb  9  49). The  Lisp  function  MAPCAR  does  exactly  this.  Given  a  func¬ 
tion  F  and  a  list  L.  it  first  applies  F  to  the  first  element  of  L,  namely  (CAR  L).Then  it 
applies  it  to  the  second  element  of  L,  namely  (CAR  (CDR  L)).  And  so  forth.  It  re¬ 
turns  a  new  list  that  contains  the  values  that  it  produced.  So  we  can  write  the  follow¬ 
ing  expression. 

(MAPCAR  F  L) 

If  F  has  the  value  (LAMBDA  (X)  (*  X  X))  and  L  has  the  value  (4, 3, 7),  the  result  of 
EVALing  our  expression  will  be  (16. 9, 49).  Suppose,  on  the  other  hand,  that  F  has  as  its 
value  a  function  that  returns,  for  any  letter  in  the  Roman  alphabet,  its  successor  (letting 
the  successor  of  Z  be  A).  And  suppose  that  L  has  the  value  (C  A  T).Then  EVALing 
(MAPCAR  F  L)  will  produce  (D  B  U). 

Modern  dialects  of  Lisp  provide  an  even  more  powerful  mechanism  for  treating 
functions  as  first-class  objects.  They  enable  programs  to  construct  and  exploit  closures , 
i.e.,  function  definitions  coupled  with  evaluation  environments  that  bind  values  to  vari¬ 
ables.  To  see  how  closures  can  be  used,  consider  the  following  example.  BOSS  wants  a 
list  of  candidates  whose  score  on  some  dimension  called  KEY  is  at  least  THRESHOLD.  To 
gel  the  list.  BOSS  executes  (CETSOME  THRESHOLD).  CETSOME  considers  many  criteria 
and  has  access  to  various  sources  of  candidates.  Each  such  source  maintains  its  own  list 
of  possible  candidates  and  it  accepts  two  inputs,  the  number  of  candidates  desired  and 
a  function  (ol  a  single  argument)  that  describes  a  test  to  be  performed  on  candidates; 
only  those  that  pass  the  test  will  be  suggested.  Let’s  call  one  of  the  sources  WELL. 
CETSOME  can  be  defined  as  follows. 
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(DEFUN  GETSOME  (THRESHOLD) 

...  /*  Consider  other  things. 

(WELL  K  #' (LAMBDA  (X)  (AND  (TEST1) 

CTEST2) 

(>  (KEY  X)  THRESHOLD)))) 

...  /*  Consider  other  things. 

) 

When  the  expression  (WELL  . ... )  is  evaluated,  the  first  thing  that  happens  is  that  its 
arguments  are  evaluated.  Its  first  argument  evaluates  to  the  value  of  the  variable  K.  It 
is  assumed  to  be  a  number.  Its  second  argument  begins  with  the  symbol  #\  which  is 
shorthand  for  a  function  (called  FUNCTION)  that  forms  a  closure  by  capturing  the  cur¬ 
rent  values  of  all  of  the  free  variables  in  the  enclosed  expression.  In  this  case,  there  is  a 
single  such  variable. THRESHOLD.  So  the  closure  that  is  formed  contains  the  function  de¬ 
scribed  by  the  LAMBDA  expression  plus  the  current  value  of  THRESHOLD.  Without  clo¬ 
sures  one  could  imagine  simply  passing  THRESHOLD  as  another  argument  to  WELL.  But 
WELL  may  be  called  by  many  different  kinds  of  customers.  It  doesn’t  know  what 
THRESHOLD  is.  All  it  knows  how  to  do  is  to  select  candidates  it  wants  to  recommend  and 
then  apply  a  single  lest  to  see  which  ones  will  be  acceptable  to  its  customer.  Its  cus¬ 
tomer  (in  this  case  GETSOME)  must  therefore  describe  all  of  the  tests  it  cares  about  as  a 
single  function  that  can  be  applied  to  a  candidate. 

We've  now  seen  enough  to  be  able  to  comment  on  some  of  the  key  ideas  that  under¬ 
lie  Lisp  and  that  play  a  key  role  in  modern  high-level  programming  languages: 

•  In  Lisp,  the  most  important  data  structure  is  the  list.  Lists  can  be  constructed  by 
programs  at  run  lime. Their  sizes  and  structures  need  not  he  declared  in  advance.  It 
is  very  easy  to  write  programs  that  manipulate  dynamic  lists  and  trees.  So.  for  exam¬ 
ple.  wc  were  able  above  to  read  in  a  polynomial  function  of  arbitrary  length  and 
then  EVAL  it. There  is  no  need  to  declare  in  advance  what  the  size  of  any  structure 
is. Thus  Lisp  had  to  provide  run-time  storage  allocation  and  garbage  collection.  (We 
should  point  out  here,  though,  that  Lisp  was  not  the  first  list  processing  language. 
That  title  belongs  to  the  1I*L  family  of  languages  Q.) 

•  In  Lisp,  one  describes  a  computation  as  a  function  to  be  evaluated.  While  Lisp  was 
not  the  first  list-processing  language,  it  was  the  first  functional  one.  Lisp  provided  a 
few  primitive  functions  and  it  allowed  programmers  to  define  new  ones.  To  make 
this  possible.  Lisp  introduced: 

•  conditional  expressions:  The  Lisp  COND  function,  which  can  have  any  number 
of  branches,  is  the  precursor  of  the  modern  if -then-el se  or  case  statement. 
We  lake  such  control  statements  for  granted,  but  at  the  time  that  Lisp  was  first 
described.  Fortran,  for  example,  only  had  a  conditional  go  to  statement. 

•  recursion:  Another  reason  that  Lisp  introduced  run-time  storage  allocation 
and  garbage  collection  was  to  make  recursion  possible. 
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•  functions  as  first-class  objects :  Functions  can  be  manipulated  by  other  func¬ 
tions  and  passed  to  other  functions  as  arguments.  In  early  implementations  of 
Lisp,  the  runtime  manipulation  of  functions  was  possible  because  the  language 
was  interpreted,  rather  than  compiled.  The  job  of  the  interpreter  was  to  trans¬ 
form  a  list  into  something  executable  and  then  run  it.  While  this  execution 
model  was  flexible,  it  was  also  slow.  Modem  implementations  of  Lisp  also  pro¬ 
vide  compilers,  with  the  consequence  that,  while  Lisp  maintains  its  flexibility,  it 
no  longer  necessarily  incurs  a  runtime  performance  penalty. 

•  Lisp  functions  are  lists,  a  data  type  that  Lisp  programs  can  manipulate.  Because 
they  arc  lists,  they  can  be  stored  within  other  data  structures  and  they  can  be  parsed 
easily  at  cither  compile  time  or  run  time. 

In  defining  Lisp.  McCarthy  did  more  than  just  create  a  new  and  arguably  conven¬ 
ient  way  to  program.  He  also  made  clear  the  connection  between  the  power  of  the 
new  language  and  the  fundamental  notions  of  computability  as  investigated  by  Turing 
and  Church.  In  particular,  McCarthy  showed  that  the  class  of  functions  that  can  be 
computed  in  Lisp  is  exactly  the  computable  functions  (i.e.,  exactly  the  set  that  can  be 
computed  by  some  Turing  machine). 

The  Lisp  language,  as  originally  described  by  McCarthy,  evolved  into  a  family  of  dialects 
that  became  the  programming  languages  of  choice  for  the  development  of  many  kinds  of 
artificial  intelligence  (Al)  systems.  The  next  few  examples  illustrate  some  of  the  reasons 
that  Lisp  was.  and  is,  so  well  suited  to  the  needs  of  Al  programmers. 


EXAMPLE  6.5  Search:  Exploiting  Recursion  and  Function  Objects 

Lisp’s  natural  control  structure  is  recursion.  This  contrasts  with  the  iterative  con¬ 
trol  structures  that  were  the  mainstay  of  other  early  programming  languages. 
What  many  Al  programs  do  is  search  and  search  can  easily  be  described  recur¬ 
sively.  To  evaluate  a  situation  to  see  whether  it  can  lead  to  a  solution  to  a  problem, 
we  generate  all  of  the  situations  that  can  be  reached  via  a  single  action  from  the 
current  one.  If  any  of  them  is  a  goal,  we  have  found  our  answer.  Otherwise,  we  call 
the  evaluation  procedure  recursively  on  each  of  the  successor  states.  In  Section 
30.3.2,  we  describe  A *,  a  general-purpose,  best-first  search  algorithm  that  does 
this,  and  in  N.2.5,  we  describe  minimax ,  an  alternative  that  is  tailored  specifically 
to  searching  two-person  game  trees. 

While  recursion  is  key  in  implementing  almost  any  search  algorithm,  another 
feature  of  Lisp  is  also  important  in  implementing  best-first  search  algorithms 
like  A*  and  minimax :  The  input  to  all  such  programs  is  a  problem  definition, 
which  must  contain  two  functions:  successors,  which  computes  a  node’s  succes¬ 
sors  and  assigns  costs  to  individual  moves,  and  h,  a  heuristic  evaluation  function. 
In  Lisp,  it  is  easy  to  write  A*  or  minimax  once  and  then  to  pass  it,  as  parameters, 
the  functions  that  it  needs. 
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EXAMPLE  G.6  Representing  Parse  Trees  as  Lists 

To  see  why  list-like  structures  that  need  not  be  declared  in  advance  are  useful  in  AI, 
consider  the  natural  language  (NL)  understanding  problem.  An  NL  understanding 
program  must  accept  as  input  one  or  more  sentences.  In  Lisp,  this  is  easy. The  input 
text  can  be  represented  as  a  list  of  words.  So  assuming  that  we  have  defined  symbols 
that  correspond  to  the  words  in  our  dictionary,  we  might,  for  example,  have: 

(the  smart  cat  smells  chocolate) 

An  early  step  in  understanding  sentences  is  usually  to  parse  them.  Recall  that, 
in  Example  11.11,  we  showed  the  following  parse  tree: 


S 


the  smart  cat  smells  chocolate 


This  tree  can  easily  be  represented  as  a  Lisp  list,  in  which  each  node  is  a  list. 
The  first  element  of  the  list  is  the  label  attached  to  the  node.  The  remaining  ele¬ 
ments  are  node's  subtrees.  So  we  have: 

(S  (NP  (the)  (Nominal  (Adjs  (Adj  (smart))) (N(cat)))) 

(VP  (V  (smell s))(NP(Nominal  (N  (chocolate)))))  ) 


EXAMPLE  G.7  Representing  Logical  Formulas  as  Lists 

In  many  task  domains,  knowledge  can  be  represented  as  sentences  in  First-order 
logic. These  sentences,  in  turn,  can  easily  be  represented  as  Lisp  lists.  For  example 
the  sentence:  K  ’ 

Yx(3y(/>(x)  A /?(*)  —  Q(y))) 
can  be  represented  as  the  list: 

(FORALL  X 
(EXISTS  Y 

(IMPLIES  (AND  (P  X) 

(R  X)) 

(Q  Y)))) 

The  sentence  can  then  he  evaluated  by  recursively  evaluating  its  subexpressio 
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EXAMPLE  G.8  Automatic  Programming 

In  Lisp,  there  is  no  distinction  between  programs  and  data.  Lisp  programs,  like 
everything  else,  are  lists.  This  turns  out  to  be  particularly  useful  if  one  wants  to 
build  programs  that  can  write  or  modify  other  programs.  So,  for  example,  one 
subfield  of  A I  is  called  automatic  programming  or  program  synthesis  R.  Its 
goal  is  to  automate  the  task  of  writing  code  to  solve  a  problem  whose  specifica¬ 
tions  have  been  provided.  Of  course,  the  problem  of  deciding  what  code  to 
write  remains  hard,  but  in  Lisp  it  is  straightforward  to  build  up  a  program  by 
composing  lists.  And  the  code  that  is  built  in  that  way  can  be  run,  as  part  of  the 
coding  and  debugging  process,  in  exactly  the  same  way  that  any  Lisp  expres¬ 
sion  is  evaluated. 


EXAMPLE  G.9  Learning  to  Improve  Performance 

An  important  characteristic  of  an  intelligent  system  is  its  ability  to  learn  and  to 
improve  its  performance  on  the  basis  of  experience.  If  a  performance  program  is 
written  in  Lisp,  then  a  learning  program,  also  written  in  Lisp,  can  modify  the  per¬ 
formance  program  using  Lisp's  basic  list  operations.  For  example,  suppose  that  we 
want  to  build  a  program  that  evaluates  its  environment  and,  on  the  basis  of  what 
it  sees,  decides  how  to  perform  a  task.  We  might  write  such  a  program  in  Lisp 
using  the  following  structure: 

(COND  (  (AND  (condi tionl) 

(condi tion2) 

(condition3))  (action  A)) 

(  (OR  (condi tion4) 

(conditions))  (action  B))  ) 

Now  suppose  that  we  want  to  learn  to  perform  the  task  better. There  are  several 
things  that  we  might  want  to  be  able  to  do.  We  might  discover  a  new  special  case 
that  requites  a  new  action. The  alternatives  that  are  listed  in  a  Lisp  COND  state¬ 
ment  ate  evaluated  in  the  order  in  which  they  are  written.  So  we  could  de¬ 
scribe  the  special  case  by  adding  a  new  branch  at  the  beginning  of  the  COND 
expression,  producing: 

(COND  (  (condi tion6)  (action  C)) 

(  (AND  (condi tionl) 

(condi tion2) 

(condition3))  (action  A)) 

(  (OR  (condi tion4) 

(conditions)) 


(action  B))  ) 
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EXAMPLE  G.9  ( Continued ) 

Note  that  we're  not  simply  claiming  that  a  programmer  can  make  this  change. 
A  Lisp  program  that  notices  that  the  change  is  necessary  can  use  CONS  to  add  the 
new  condition  to  the  front  of  the  list  of  branches.  Or  we  might  want  to  generalize 
the  behavior  of  our  program  so  that  it  works  in  some  additional  environments. 
One  way  to  do  that  would  be  to  change  the  AND  on  line  2  to  an  OR.  Another  way 
would  be  to  remove  one  or  more  of  conditions  1,2,  and  3.  These  changes  can  eas¬ 
ily  be  made  by  the  learning  program  to  the  list  representation  of  the  performance 
program.  Then  the  new  program  can  be  run  and  evaluated  to  see  whether  the 
change  improved  its  performance. 


EXAMPLE  G.10  Procedural  Knowledge 

The  core  of  any  A1  program  is  its  knowledge  base.  Some  knowledge  can  naturally 
be  thought  of  as  declarative.  For  example.  John’s  phone  number  is  a  string  of  dig¬ 
its.  his  age  is  a  number,  his  mother's  name  is  a  string,  and  his  birthday  wish  list  is  a 
list  of  objects.  But  in  real  applications,  the  values  for  even  these  simple  attributes, 
much  less  more  complex  ones,  may  not  always  be  known.  Instead,  what  may  be 
available  are  procedures  for  computing  values  as  they  are  needed.  Thus  it  may 
make  sense  to  store,  as  the  value  of  John’s  phone  number,  a  function  closure  that 
was  created  in  a  context  in  which  it  was  known  what  city  he  lives  in. The  function 
searches  that  city's  phone  book  to  find  John’s  number.  Similarly,  the  value  for 
John's  birthday  wish  list  might  be  a  procedure  that  executes  in  an  environment 
that  knows  his  hobbies  and  his  favorite  foods. 


As  originally  defined.  Lisp  was  a  purely  functional  language.  All  computation  oc¬ 
curred  by  evaluating  functions.  There  existed  no  operations  that  caused  side  effects. 
Most  modern  dialects  have  been  extenJed  to  allow  side  effects  in  various  ways.  In  ad¬ 
dition  to  input  and  output  functions,  there  are  typically  functions  that  assign  values  to 
variables  and  functions  that  destructively  modify  lists  (as  opposed  simply  to  creating 
new  ones). There  may  also  be  a  way  to  descrihe  a  sequence  of  actions  that  should  be 
performed.  So  most  modern  Lisps  are  not  purely  functional. 

But  there  are  arguments  for  purely  functional  programming.  In  particular,  when 
side  effects  are  allowed,  constructing  correctness  proofs  may  be  hard.  In  the  vears  since 
McCarthy’s  original  description  of  Lisp,  a  variety  of  other  purely  functional  languages 
have  been  defined.  These  languages,  of  which  Haskell  CJ  is  probably  the  most  widely 
used,  owe  an  intellectual  debt  to  Lisp,  as  well  as  to  other  developments  in  the  area  of 
high-level  programming  language  design. 
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Applications:  Tools  for 
programming,  Databases 
and  Software  Engineering 

The  formal  structures  that  we  have  been  discussing  have  inspired  the  design  of 
many  different  kinds  of  tools  that  programmers  use  every  day.  We  have  already 
discussed  the  design  of  high-level  programming  languages  and  the  construction 
of  compilers  for  them.  In  Appendix  O,  we'll  discuss  the  use  of  regular  expressions  in 
modern  programming  environments.  In  this  appendix,  we’ll  briefly  describe  some  other 
kinds  of  useful  tools  whose  design  is  rooted  in  the  theory  that  we  have  described. 


^  1  Proving  Correctness  Properties  of  Programs 
and  Hardware 

Consider  the  problem  of  proving  that  a  particular  software  program  or  hardware  device 
correctly  implements  its  specification.  In  the  rest  of  this  discussion,  we  will  use  the  term 
“system”  to  describe  both  software  and  hardware  systems. 

If  the  answer  to  the  Enlscheidungsproblem  that  we  introduced  in  Chapter  18  had 
been  yes  (in  other  words,  if  there  did  exist  a  procedure  to  determine  whether  an  arbi¬ 
trary  sentence  in  first-order  logic  is  valid),  then  it  might  be  possible  to; 

1,  Write  a  first-order  logic  sentence  that  corresponds  to  the  specifications  for  a 
system. 

2.  Write  another  first-order  logic  sentence  that  describes  what  the  system  actually 
does.  An  effective  way  to  do  this  is  in  two  steps; 

2.1<  Define  a  set  of  first-order  logic  axioms  that  describe  the  primitive  operations 
that  a  system  can  perform.  For  example,  we  could  describe  the  behavior  of  an 
individual  gate  in  a  hardware  circuit  or  an  individual  statement  in  some  par¬ 
ticular  programming  language. 
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2.2.  Derive  from  those  axioms  and  the  definition  of  a  particular  system  (its 
logic  design  or  its  code)  the  required  sentence  that  describes  the  systems 
behavior 

3.  Build  a  theorem  proving  program  that  could  determine  whether  the  sentence 
from  step  2  entails  the  sentence  from  step  1 .  ( In  other  words:  Given  a  system  that 
behaves  as  described  in  the  sentence  from  step  2.  must  the  sentence  from  step  1 
be  true?  Put  another  way:  Does  the  system  satisfy  the  specification?) 

But.  as  we  saw  in  Chapter  19,  the  answer  to  the  Entscheidungsproblcm  is  no. That 
result,  proved  independently  by  Turing  and  by  Church  in  the  mid  1930s,  coupled  with 
the  Incompleteness  Theorem  published  by  Kurt  Gbdel  at  about  the  same  lime,  dashed 
the  hopes  of  mathematicians  that  they  might  find  a  completely  syntactic  basis  for 
mathematics.  It  also  means  that  it  won't  be  possible  to  build  a  completely  automatic, 
general-purpose,  first-order  logic-based  verification  system  that  can  be  guaranteed  to 
hull  and  return  True  precisely  in  case  the  target  system  satisfies  its  specification  and 
Folse  otherwise. 

Early  interest  in  the  Enlschcidungsproblem  was  motivated  by  a  concern  with 
issues  in  mathematics  and  philosophy.  Now,  with  the  advent  of  modern  computers,  we 
have  a  new  and  more  practical  need  for  a  syntactically-based  theorem-proving 
method:  We  build  huge  and  complex  pieces  of  hardware  and  software  and  we  trust 
them  to  perform  critical  functions.  If  we  could  build  programs  that  could  produce  for¬ 
mal  proofs  of  the  correctness  of  those  critical  systems,  we  would  have  an  increased 
basis  for  the  trust  we  place  in  them.  (We  say  an  increased  basis  for  the  trust,  rather 
than  total  trust,  because  there  would  still  be  issues  like  the  extent  to  which  the  formal 
specification  corresponds  to  our  goal  for  our  systems,  the  correctness  of  the  proof- 
generator  itself,  and  limits  to  our  ability  to  describe,  all  the  way  down  to  the  electrons, 
the  behavior  of  the  hardware  on  which  our  systems  run).  Fortunately,  the  negative 
results  of  Church  and  Turing  do  not  doom  all  efforts  to  build  mechanical  verification 
systems.  It  is  true  that  we  showed,  in  Chapter  21.  that  there  arc  some  program  proper¬ 
ties.  including  some  that  could  be  part  of  many  reasonable  specifications,  that  are 
undecidable.  These  properties  include: 

•  Given  a  program  P,  does  P  halt  on  all  inputs?  Does  P  halt  on  some  particular  input 
u1?  Does  P  halt  on  any  inputs? 

•  Given  a  program  P.  does  P  ever  output  anything? 

•  Given  two  programs  ft  and  ft,  are  they  equivalent? 

Nevertheless,  there  are  useful  correctness  properties  that  can  be  proved,  at  least  of 
some  programs  and  devices.  For  example,  while  it  is  not  possible  to  decide  whether  an 
arbitrary  program  always  halts,  it  may  be  possible  to  prove  that  a  particular  one  does. 
To  construct  such  targeted  proofs,  we  require  a  logical  language  that  can  be  used  to 
represent  specifications  and  to  describe  the  behavior  of  systems.  In  particular,  we  must 
find  a  logical  language  that  meets  all  of  the  following  requirements. 

•  It  is  expressive  enough  to  make  it  possible  to  encode  both  specifications  and  descrip¬ 
tions  of  system  behavior. 
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•  Its  decidability  properties  are  strong  enough  to  make  it  useful. 

•  The  complexity  of  its  decision  procedures  is  acceptable  for  the  size  problems  we 
wish  to  solve. 

These  issues  trade  off  and  there  does  not  appear  to  be  a  single  approach  that  worst 
best  for  all  kinds  of  problems.  But  there  are  two  general  approaches,  each  of  which  has 
proven  to  be  effective  for  some  classes  of  important  problems: 

•  deductive  verification  systems,  in  which  steps  1  through  3  are  done  but  step  3  is  typ¬ 
ically  only  semiautomatic— a  human  user  must  guide  the  theorem  prover;and 

•  model  checking  systems,  which  are  usually  fully  automatic  but  are  limited  to  rea¬ 
soning  about  systems  that  can  be  described  with  finite  (and  thus  decidable)  models. 


.1  Deductive  Verification 

Deductive  verification  systems  find  proofs  in  much  the  same  way  mathematicians  do. 
The  core  of  all  such  systems  is  a  theorem  prover  that  begins  with  a  set  of  axioms  and 
then  applies  rules  of  inference  to  derive  conclusions. The  theorem  prover  may  also  be 
augmented  with  a  set  of  conventional  programs  that  perform  tasks  such  as  the  compu¬ 
tation  of  standard  arithmetic  operations.  Effective  verification  systems  must  cope  with 
two  realities  of  the  task: 


•  expressively  powerful  logical  systems  (including  standard  first-order  logic)  are 
undecidable.  and 

•  the  number  ol  legal  proofs  grows  exponentially  with  the  length  of  the  proof  so, 
even  when  a  prool  exists,  a  brute  force  approach  to  finding  it  will  take  too  long. 

Modern  deductive  verification  systems  S  solve  those  problems  by  choosing  a  care¬ 
fully  designed  logical  language  and  then  exploiting  an  interactive  theorem  prover.  A 
human  user  guides  the  theorem  prover  in  one  or  more  of  the  following  ways. 


•  The  user  describes  the  steps  (lemmas)  that  should  be  solved  on  the  way  to  a  com¬ 
plete  proof.  If  the  theorem  prover  is  unable  to  complete  a  step,  the  user  can  provide 
additional  information. 


•  The  user  tells  the  theorem  prover  what  substitutions  to  perform  as  the  variables 
and  constants  of  one  expression  are  matched  against  those  of  another. 


Interactive  verification  systems  of  this  sort  have  been  used  both  to  find  faults  and  to 
prove  correctness  in  a  wide  variety  of  critical  applications.  Some  examples  Q  include: 


.  The  discovery  of  flaws  in  the  design  of  control  software  for  an  observatory. 

•  ’Hie  analysis  of  cryptographic  protocols  (e.g.,  Bluetooth)  for  security  properties. 


But  the  act  that  people  must  be  part  of  the  verification  process  has  meant  that  the 
spread  of  this  approach  into  pract.cal  system  construction  has  been  limited  bv  the 
scaraty  or  peop  e  who  understand  the  required  logical  language  and  who  arc  skilled  at 
providing  the  assistance  that  the  theorem  prover  needs. 


902  Appendix  H  Applications:  Tools  for  Programming,  Databases  and  Software  Engineering 


H.1.2  Model  Checking 

Suppose  that  the  system  (software  or  hardware)  whose  correctness  we  would  like  to  prove 
can  be  modeled  with  a  Finite  number  of  states.  Perhaps  (as  is  often  the  case  with  hardware) 
it  was  originally  designed  as  a  finite  state  machine.  Or  perhaps  it  was  designed  some  other 
way  but  its  state  can  be  described  by  a  finite  number  of  variables  and  each  ot  those  vari¬ 
ables  can  take  on  a  finite  number  of  values.  In  the  latter  case,  it  is  straightforward,  to  build 
a  finite  stale  machine  that  describes  the  operation  of  the  system.  Many  concurrent  systems, 
for  example,  can  be  modeled  in  this  way  since  there  are  typically  only  a  finite  number  of 
shared  variables,  each  of  which  can  be  described  as  taking  on  values  from  only  a  finite  set. 
Further,  it  is  sometimes  the  case  that,  although  in  principle  a  variable  may  take  on  some 
unbounded  set  of  values,  any  logical  errors  in  the  design  of  the  system  in  which  the  variable 
occurs  can  be  detected  by  considering  only  some  well-crafted  subset  of  those  values. 

When  it  is  possible  to  create  such  finite  system  descriptions,  a  powerful  class  of 
programs  called  model  checkers  ([Clarke  and  Emerson  1  1  [,  [Quielle  and  Sifakis 

19821.  [Clarke,  Grumberg  and  Peled  1999])  can  be  used  to  check  for  system  correct¬ 
ness.  The  basic  idea  is  that  we  compare  a  finite  description  of  our  system  to  an 
appropriately  crafted  logical  statement  that  describes  the  system's  specification. 
The  system  is  correct  iff  it  satisfies  (i.e.,  is  a  model  of)  the  specification.  Thus  the 
name  model  checking.  Undecidability  is  not  a  problem  for  model  checkers  (since 
finite  descriptions  can  be  described  in  Boolean  logic,  which  is  decidable).  But  com¬ 
binatorial  explosion  is  a  problem  and  current  research  in  model  checking  B  is 
focused  on  ways  of  reducing  that  explosion  in  practice. 

The  primary  use  of  model  checkers  has  been  to  prove  the  correctness  of  digital  cir¬ 
cuits  and  concurrent  programs.  Since  those  systems  are  not  intended  to  halt,  the  exam¬ 
ples  we  will  present  in  this  section  all  describe  infinite  computations. 

The  first  step  in  using  a  model  checker  is  to  construct  a  finite  stale  model  of  the 
system  whose  correctness  we  wish  to  prove.  Suppose  that  v  is  the  set  of  variables  in 
the  system.  Then  we  create,  in  our  model,  one  state  for  each  possible  assignment  of 
values  to  the  variables  in  ».  For  example,  suppose  that  v  contains  the  variables  .v.  y, 
and  i  and  each  of  them  can  take  on  the  values  I)  and  I .  Then  the  model  we  build  will 
have  eight  states  corresponding  to  the  eight  ways  or  assigning  values  to  those  vari¬ 
ables.  So.  for  example,  one  such  state  would  correspond  to  the  valuation 
(.t  -  0;  y  =  0:  z  =  0).  More  generally,  we  can  describe  each  state  by  a  set  of  atomic 
propositions  that  are  true  in  it.  So,  for  example,  we  could  use  propositions  such  as 
(.v  =  0).  In  the  examples  that  follow,  we  will  assume  that  all  system  variables  are 
binary  and  we  will  use  propositions  with  names  corresponding  to  the  variables.  The 
proposition  a  will  be  true  iff  the  variable  u  has  the  value  True  (or  1  ).This  technique 
can  easily  be  extended  to  handle  any  variable  that  ranges  over  a  finite  number  of 
values  by  encoding  those  values  in  binary.  When  modeling  real  systems,  it  will  often 
happen  that  some  slates  correspond  to  valuations  that  cannot  occur.  Since  those 
stales  will  not  be  reachable  from  any  start  state,  it  is  not  necessary  to  represent  them 
in  the  model. 

The  two  other  important  things  we  need  to  do  are  to  specify  the  system’s  start  statefsl 
and  to  specify  how  it  moves  from  one  state  to  the  next.  So  a  complete  model  of  a  system 
Y  can  be  given  as  a  five-tuple  (called  a  Kripke  structure)  M  •=■  (.V.  .V„,  /’.  ft,  /  )  where- 
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S  is  a  finite  set  of  states  designed,  as  described  above,  to  correspond  to  the  possible 
assignments  of  values  to  the  variables  of  Y. 

5„,  a  subset  of  S,  is  the  set  of  start  states  of  Y. 

P  is  a  non-empty  set  of  atomic  propositions  that  describe  properties  (such  as  the 
fact  that  the  variable  x  =  0)  that  may  hold  in  the  various  states  of  S. 

R  is  a  transition  relation.  It  is  a  subset  of  S  x  S.  The  pair  (r/j.  <72)  G  R  iff  Y  can  go  di¬ 
rectly  from  qx  to  q2.  Since  Y  does  not  halt,  it  must  be  the  case  that,  for  every  state  qx 
in  S,  there  exists  at  least  one  state  q?  such  that  (qx,  q2)  e  R.  Note  that  we  do  not  re¬ 
quire  that  R  be  a  function.  So  it  is  possible  to  model  systems  whose  behavior  is 
nondelerministic. 

L  is  a  function  that  labels  each  state  in  S  with  the  set  of  propositions  that  are  true  in 
it.  So  L  maps  from  S  to  9P(P). 


EXAMPLE  H.1  Modeling  a  Simple,  Two-Switch  System 

Consider  a  very  simple  system  with  two  switches.  One  is  the  a  switch  and  it  can  be 
on  or  off. The  other  is  the  blc  switch.  It  can  be  off  or  it  can  be  thrown  to  b  or  c,bul 
that  can  happen  only  if  it’s  currently  off.  Once  it’s  ever  thrown  to  c,  it  can't  be 
changed.  The  a  switch  can  only  go  off  if  c  is  on.  The  system  starts  out  with  the  a 
switch  on  and  the  blc  switch  off.  We  can  model  this  system  with  the  Kripke  struc¬ 
ture  M  =  ({ {<*},  {«,  b },  (a,  c},  {c}},  { {a}},  {a,  b,  c}.  R ,  L).  where  L  assigns  to 
each  state  the  labels  we’re  using  as  the  state’s  name  and  R  is  given  by: 


A  compulation  of  a  system  V.  described  by  a  Kripke  structure  M.  is  a  path  througl 

1  ,  ™  hive  m^e  th  PUla"°m  d°"'1  ^  such  P«h  will  be  infinite.  Since 

state  may  have  more  than  one  successor,  we  can  describe  all  paths  that  y  could  follov 

from  some  start  stale  q .  as  a  computation  tree  rooted  at  q 
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EXAMPLE  H.2  The  Two-Switch  System's  Computation  Tree 

Consider  again  the  system  we  described  in  Example  H.l.  Its  computation  tree, 
starting  from  state  {a}  contains: 


Steps  two  and  three  in  using  a  model  checker  are  to  state  the  specification  and  then 
to  show  that,  on  every  computation  path,  the  system  model  satisfies  the  specification. 
The  technique  that  is  used  to  perform  step  three  depends  on  the  language  that  is  used 
to  define  the  specification  in  step  two.  We’ll  consider  two  approaches: 

•  Use  a  temporal  logic  to  define  the  specification  and  apply  one  of  a  family  of  model 
checking  algorithms  to  compare  the  specification  to  the  Kripke  structure  that  mod¬ 
els  the  system. 

•  Describe  the  specification  as  a  Biichi  automaton,  convert  the  Kripke  structure  that 
models  the  system  into  a  second  Biichi  automaton,  and  use  operations  on  automata 
(complement,  intersection,  emptiness  checking)  to  decide  whether  the  system  satis¬ 
fies  the  specification. 

We  first  consider  writing  specifications  as  logical  formulas.  Typically,  the  specifica¬ 
tion  for  a  system  Y  imposes  constraints  on  the  computational  paths  that  Y  can  follow. 
For  example,  we  might  want  to  guarantee  that  Y  never  enters  a  state  that  corresponds 
to  the  situation  in  which  x  =  0  and  y  =  1.  Or  we  might  want  to  guarantee  that,  once  x 
becomes  O.y  never  does.  To  facilitate  stating  such  requirements,  the  logical  language 
that  is  used  in  model  checking  systems  is  typically  some  form  of  temporal  logic,  in 
which  it  is  possible  to  describe  constraints  on  the  future  states  of  the  system  given  its 
current  state.  Formulas  in  temporal  logic  may  describe  properties  that  must  be  true  of 
states,  including  properties  that  must  be  true  along  paths  that  can  emerge  from  those 
states. 

There  are  two  main  kinds  of  temporal  logics  that  are  used  in  model  checkers: 

•  Linear  time  logics,  in  which  there  is  always  a  unique  future,  and 


H.1  Proving  Correctness  Properties  of  Programs  and  Hardware  905 


•  Brandling  time  logics,  in  which,  given  a  particular  moment  in  time,  multiple  futures 
are  possible.  Branching  time  logics  typically  provide  quantifiers  that  can  range  over 
paths.  Common  quantifiers  are: 

•  A  (for  all  computation  paths),  and 

•  E  (for  some  computation  path,  i.e.,  there  exists  some  computation  path). 

Temporal  logics  provide  operators  that  can  be  applied  to  propositions.  The  following 
operators  are  present  in  the  branching  time  logical  language  CTL*  0,  and  are  typical: 

•  G  P,  which  holds  iff  P  is  always  true  (is  true  globally), 

•  F  P,  which  holds  iff  P  will  eventually  (at  some  time  in  the  future)  become  true, 

•  X  P,  which  holds  iff  P  holds  in  the  next  state, 

•  P,  U  P2,  which  holds  iff  P2  eventually  becomes  true  and,  at  every  state  until  then,  Pj 
is  true  (Pj  until  P2),and 

•  P,  R  P2,  which  holds  iff  P2  holds  in  every  state  up  to  and  including  the  first  state  in 
which  Pj  is  true  (P,  releases  P2).  It  is  possible  that  P\  may  never  become  true,  however. 


EXAMPLE  H.3  Some  Simple  Specifications  of  the  Two-Switch  System 

We  return  again  to  the  system  we  described  in  Example  H.l.This  time  we’ll  con¬ 
sider  some  possible  specifications  and  see  whether  the  system  satisfies  them: 


a  V  c 

This  holds  in  all  states. 

EG  (a  V  c) 

1 !  here  exists  a  path  such  that  o  V  c  always.) 

This  holds  in  all  states. 

EFc 

(There  exists  a  path  such  that  eventually  c.) 

This  holds  in  all  states. 

EG  c 

(There  exists  a  path  such  that  c  always.) 

This  holds  only  in  (fl,  c)  and  f  c\ 

EcRa 

fnicrc  exists  a  path  where  a  until  released  by  c.) 
This  holds  in  all  states  exceDt  (c) 
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Now  we  have,  for  a  system  Y.  a  Kripko  structure  M  that  describes  Y's  implementa¬ 
tion  and  a  temporal  logic  formula  /that  describes  the  requirements  (specifications)  for 
Y.  The  final  step  in  determining  the  correctness  of  Y  is  to  decide  whether  the  imple¬ 
mentation  conforms  to  the  specification.  We  would  like  to  prove  that  there  is  no  path 
through  M  that  fails  to  satisfy  f.  If,  on  the  other  hand,  there  is  such  a  path,  we  would  like 
to  report  it. The  fact  that  model  checkers  can  do  more  than  "just  say  no"  is  one  reason 
that  they  are  particularly  useful  in  practice:  A  counterexample  tells  the  system's  devel¬ 
opers  exactly  where  the  system  can  fail  and  thus  points  the  way  to  solution  to  the  prob¬ 
lem.  Further,  if  a  specification  requires  that  there  exist  a  path  with  some  desirable 
property,  the  model  checking  process  will  find  and  report  such  a  "witness.” 

The  most  straightforward  algorithms  for  model  checking  work  with  an  explicit  rep¬ 
resentation  of  the  Kripke  structure  M  -  (5,  S„.  R,  P.  L)  that  describes  the  implemen¬ 
tation.  The  idea  is  that  we  will  consider  the  states  in  S  and  we  will  annotate  each  of 
them  with  those  subformulas  from  the  specification  /  that  can  be  shown  to  hold  in  it. 
The  annotation  process  begins  with  the  labels  that  are  already  attached  to  the  states  by 
the  labeling  function  L.  Then  it  considers  the  subformulas  in  /,  starting  with  the  most 
primitive,  builds  up  longer  and  longer  annotations,  and  propagates  them  through  M.  If 
all  subformulas  hold  in  all  start  states,  then  all  computation  paths  satisfy  /. 


EXAMPLE  H.4  Evaluating  a  Specification  of  the  Two-Switch  System 

We'll  continue  with  the  same  example  system.  Suppose  that  the  specification  for  it 
is  EC  c.  (In  other  words,  from  the  start  state  there  exists  a  path  along  which  c  will 
eventually  become  true  and  stay  true.) 


•  c  holds  in  state  {c}. 

•  EG  c  also  holds  in  {c}. 

•  Thus  EG  c  also  holds  in  { a,  c }. 

•  Thus  EG  c  also  holds  in  {a},  which  is  the  only  start  stale,  so  this  implementa¬ 
tion  satisfies  EG  c. 
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The  details  of  any  model  checking  algorithm  depend  on  the  temporal  logic  language 
that  is  used  for  the  specification.  But,  for  example,  for  CTL’1',  the  language  we  discussed 
above,  there  exists  a  model  checking  algorithm  and  its  complexity  is  0(1/ 1  •  (|S|  +  Ifll)). 
where /is  the  formula  that  describes  the  specification.  So  it  is  linear  in  both  the  size  of 
the  formula  /  and  the  size  of  the  Kripke  structure  M.  For  large  systems,  though,  this 
isn’t  good  enough  because  the  number  of  stales  in  M  may  be  0(2^),  where  v  is  the 
number  of  variables  in  the  system  that  Af  is  modeling. 

To  solve  this  problem  we  need  a  technique  that  docs  not  require  the  explicit  construction  of 
M  before  we  can  start.  Suppose  that  instead  of  describing  a  system  Y  as  a  set  of  states  and  tran¬ 
sitions  between  them,  we  could  describe  Y  as  a  Boolean  function.Then  we  could  use  an  ordered 
binary  decision  diagram  (OBDD).as  described  in  B.  13, as  an  efficient  way  to  represent  Y. 

To  start,  we’ll  describe  each  state  as  the  Boolean  function  that  describes  the  condition 
under  which  Y  is  in  that  state.  So,  for  example,  suppose  that  there  are  three  atomic  propositions 
(p,,  in,  ?;,)  in  our  model.  Then  we’ll  represent  the  state  shown  in  Figure  H.l(a)  as 
?i|  a  -.tJ2  A  f'v  Now  consider  any  transition  in  a  Kripke  structure,  as  for  example  the  transition 
from  stale  ( I )  to  slate  (2)  shown  in  Figure  H. 1(b).  We  can  think  of  this  transition  as  a  relation 
between  the  sets  of  propositions  that  are  true  in  stale  ( 1 )  and  those  that  arc  true  in  stale  (2). 

To  describe  it  that  way,  we  need  a  second  set  of  proposition  names.  We’ll  use  the 
original  set  A  to  describe  the  propositions  that  hold  in  stale  (1)  and  a  new  set  A'  to 
describe  those  that  hold  in  state  (2).  Then  we  can  describe  the  transition  relation  using 
its  characteristic  function  (i.e.,  a  Boolean  function  whose  value  is  True  for  each  ele¬ 
ment  of  the  relation  and  False  otherwise).  Using  this  idea,  we  construct,  for  the  single 
transition  shown  above,  the  Boolean  function: 


u,  A  -,Vi  A  i>3  A  U|*  A  V}  A  -11*3*. 

From  the  functions  that  describe  individual  transitions,  we  can  construct  a  single  func¬ 
tion  that  is  true  for  an  entire  transition  relation.  It  is  true  whenever  any  of  the  transition 
relations  is  true.  So.  for  the  simple  two-state,  two- transition  structure  of  Figure  H.l(c). 
we  can  construct  the  Boolean  function: 

(  »>,  A  -»IS  A  Vj  A  V\  A  A  -itt.V)  V  (l»l  A  IS  A  -.Uj  A  V\  A  IS*  A  ->Vy). 


00 


FIGURE  H.l  States  can 
be  described  as  Boolean  ex¬ 
pressions 
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This  representation  doesn’t  look  smaller  than  the  original  stale  model,  but  that's 
primarily  hecause  the  original  stale  model  was  already  small.  In  eases  where  the  state 
model  is  enormous,  the  OBDD  representation  of  the  corresponding  Boolean  function 
is  often  small  and  it  is  possible  to  construct  that  representation  directly  from  Ts 
description  (as  a  logic  circuit  or  as  a  program),  without  first  creating  the  Kripke  struc¬ 
ture  (and  thus  representing  all  possible  combinations  of  variable  values  explicitly). 
This  insight  has  led  to  the  development  of  u  more  powerful  class  of  verification  pro¬ 
grams  called  symbolic  model  checkers,  which  exploit  system  models  described  as 
Boolean  functions,  which  in  turn  are  represented  as  OBDDs. 

As  we  pointed  out  in  B.1.3,  there  are  some  systems  (e.g..  an  n-bit  multiplier)  whose 
OBDD  representation  grows  as  C>(2ImI).  But  most  useful  systems  don't  have  that  prop¬ 
erty  and  symbolic  model  checkers  are  now  routinely  used  to  prove  the  correctness  of 
systems  whose  Kripke  structure  contains  ltb°  or  more  states. 

An  alternative  approach  to  model  checking  is  aulomata-bused.  In  this  approach,  both 
the  specification  and  the  Kripke  structure  that  describes  the  implementation  are  described 
as  Biichi  automata  (automata  that  accept  infinite  strings,  as  described  in  Section  5.12).  We 
use  Biichi  automata  here  because  strings  that  correspond  to  computations  will  be  infinite. 

Representing  many  kinds  of  specifications  as  Biichi  automata  is  straightforward.  Recall 
that,  in  Example  5.3S).  we  showed  the  Biichi  automaton  for  a  single  simple  requirement, 
mutual  exclusion.  We  can  do  the  same  thing  For  many  other  important  properties. 


EXAMPLE  H.5  A  Biichi  Automaton  for  a  Simple  Liveness  Property 

The  following  Biichi  automaton  corresponds  to  a  simple  liveness  property.  As  in 
Example  5.39,  wc  use  the  atomic  propositions:  C/<„  (process^  is  in  its  critical 
region)  and  CR |  ( process j  is  in  its  critical  region).  This  time,  we  require  that 
process o  eventually  enter  its  critical  region: 

True 

C  «„ 


Next  we  need  a  Biichi  automaton  that  corresponds  to  an  implementation.  Convert¬ 
ing  a  Kripke  structure  M  -  (S.  .S„.  R,  P.  1.)  for  a  system  V’  into  a  Biichi  automaton  B 
that  accepts  all  and  only  those  strings  that  correspond  to  compulations  of  y  is  straight¬ 
forward,  as  shown  in  the  following  example.  We  label  each  transition  with  the  condi¬ 
tion  under  which  it  can  he  taken  if  we  are  to  guarantee  that  its  annotation  holds  We 
create  a  new  start  state  in  which  no  propositions  are  true  and  transitions  from  it  to  all 
elements  of  S„.  All  states  are  accepting  states.  Any  sequence  that  would  correspond  to 
a  path  that  Y cannot  take  will  simply  die.  1 
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EXAMPLE  H.6  Converting  a  Kripke  Structure  to  a  Buchi  Automaton 

Returning  again  to  our  two-switch  example,  we  can  convert  the  Kripke  structure 
to  a  Buchi  automaton  as  shown  here.  As  we  did  in  labeling  the  states,  we  will  label 
the  transitions  with  the  propositions  that  must  be  true.  All  others  must  be  false. 
So,  for  example,  a  A  b  should  be  read  as  equivalent  to  a  A  b  A  -.c. 


Given  two  BUchi  automata,  fis#*EC»  which  corresponds  to  V s  specification,  and 
which  corresponds  to  the  Kripke  structure  that  describes  an  implementation, 
we're  ready  to  decide  whether  BntP  satisfies  BSPEC.  We  proceed  as  follows:  Construct 
Bum),  a  BUchi  automaton  that  accepts  the  complement  of  L(BSPEC).  So  BBad  accepts 
exactly  the  strings  that  violate  the  specification  Next,  construct  Bboth ,  a  Buchi 
automaton  that  accepts  the  intersection  of  L(Bbad)  and  L(B!MP).  Finally,  test  whether 
L(Bitoni) is  cmPly- ,f  not.  then  there  are  computation  sequences  that  are  possible 
in  the  implementation  but  that  are  not  allowed  by  the  specification.  If  it  is  empty,  then 
there  are  no  such  computations  and  we  have  proved  that  the  system  satisfies  its  speci¬ 
fication.  Note  that  this  procedure  works  because  the  class  of  languages  accepted  by 
BUchi  automata  is  closed  under  both  complement  and  intersection  and  there  exists  a 
decision  procedure  for  the  emptiness  question.  It  is  also  possible  to  skip  the  comple¬ 
ment  step  if  the  user  enters  the  negative  specification  directly. 

Using  these  techniques  H,  model  checkers  have  been  used  to: 

•  Prove  the  correctness  of  general-purpose  processor  chips. 

•  Prove  the  correctness  of  special-purpose  processors  such  as  for  video  game  consoles. 
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•  Prove  that  a  logic  optimizer  has  not  altered  the  functionality  of  a  circuit.  This  is 
done  by  showing  that  the  functions  for  the  two  circuits  (the  original  one  and  the 
optimized  one)  are  identical. 

•  Prove  the  correctness  of  network  protocols,  including,  for  example,  the  alternating 
bit  protocol  described  in  1.1.2. 

•  Prove  the  correctness  of  critical  real  time  systems  such  as  the  controllers  for  air¬ 
craft  and  space  exploration  vehicles.  For  example,  the  SPIN  O  model  checking  sys¬ 
tem  found  five  previously  undetected  concurrency  errors  in  the  plan  execution 
module  of  the  controller  foT  a  space-craft  that  NASA  launched  in  1998. 

H.2  Statecharts:  A  Technique  for  Specifying 
Complex  Systems 

Consider  the  following  way  of  dividing  systems  into  two  important  classes  (as  described 
in  Hard  1 987J): 

•  Reactive  systems  are  driven  by  (possibly  asynchronous)  sequences  of  external 
events.  So.  for  example,  the  telephone  system,  your  watch,  your  car,  your  microwave, 
your  Web  search  engine,  and  your  operating  system  are  all  reactive  systems.  Reac¬ 
tive  systems  typically  have  little  control  over  the  sequences  of  inputs  that  they  may 
receive. 

•  Transformational  systems,  on  the  other  hand,  typically  have  more  control  over 
their  inputs. 'rhey  accept  inputs  in  particular  forms  and  compute  functions  of  them. 
For  example,  a  payroll  system  accepts  an  input  file  and  outputs  a  set  of  checks. 

While  the  distinction  between  these  two  kinds  of  systems  is  not  hard  and  fast,  it  is 
useful.  In  particular,  it  highlights  one  reason  that  designing  reactive  systems  is  hard 
(the  fact  that  arbitrary  sequences  of  inputs  must  be  handled  properly).  And  it  suggests 
a  way  to  build  tools  that  are  particularly  well-suited  for  the  design  of  those  systems:  Let 
the  tool  provide  explicit  support  for  describing  the  way  that  the  system’s  slate  changes 
as  inputs  are  received. 

In  Chapter  5  we  used  this  approach  to  system  design  w  hen  we  built  finite  stale  machines. 
But  what  about  real  systems  with  real  complexity?  Can  they  too  be  modeled  with  a  finite 
number  of  states?  It  turns  out  that  often  they  can.  And.  in  other  cases,  they  can  be  modeled 
with  a  finite  number  of  states  plus  a  set  of  variables  that  can  lake  on  arbitrary'  values. 

A  family  of  design  tools  based  on  statecharts  O.  as  described  in  |Harel  1987]  and 
[Harel  and  Politi  1998],  exploits  this  observation.  A  statechart  is  a  hierarchically  struc¬ 
tured  finite  state  machine. The  hierarchical  structure  makes  it  possible  to: 

•  view  a  system  design  at  whatever  level  of  detail  is  necessary. 

•  describe  a  system  design  using  fewer  transitions  than  would  he  required  in  a  flat 
structure.  This  happens  because  a  single  transition  from  a  parent  state  implicitly 
describes  a  whole  family  of  transitions  from  its  child  states. 

Statecharts  differ  from  finite  stale  machines,  as  we’ve  been  using  them,  in  one  addi¬ 
tional  important  way:  Suppose  that  a  complex  reactive  system  is  made  up  of  a  set  of 


H.2  Statecharts:  A  Technique  for  Specifying  Complex  Systems  911 


components  that  act  independently  (or  mostly  independently)  of  each  other  as  they 
respond  to  different  kinds  of  input  signals.  For  example,  imagine  a  cell  phone  that  must 
simultaneously  listen  for  incoming  calls  while  allowing  its  user  to  manage  the  local 
Phonebook.  Ibe  "state”  of  such  a  system  must  reflect  the  states  of  all  of  its  compo¬ 
nents.  Rather  than  forcing  an  explicit  enumeration  of  all  such  state  combinations  as  a 
way  to  describe  the  overall  system's  state,  statecharts  allow  the  specification  of  parallel 
state  sets.  A  complex  system  can  then  be  described  as  being  in  multiple  states  at  once 
and  the  total  number  of  states  that  are  required  in  the  description  becomes  the  sum  of 
the  numbers  of  the  separate  states,  rather  than  their  product. 

Statecharts  have  been  widely  used  in  a  variety  of  software  design  contexts,  including 
real-time  systems,  simulations,  and  user  interfaces.  Statechart  capabilities  are  now  part 
of  many  software  design  and  implementation  systems,  including  general-purpose  tools 
such  as  the  Unified  Modeling  Language  (UML)  0,  as  well  as  specialized  languages 
that  have  been  crafted  to  support  the  design  of  particular  kinds  of  reactive  systems.  For 
example.  SCXML  lQ  is  a  statechart-based  tool  that  supports  the  design  of  voice  and 
multimodal  interfaces.  The  details  of  how  one  specifies  a  set  of  stales  vary  from  one 
system  to  the  next.  Typically  there  exists  both  a  graphical  language  and  a  text-based 
one  (generally  based  on  some  form  of  XML).  In  the  example  that  we  are  about  to  pres¬ 
ent.  we  use  a  representative  kind  of  graphical  language. 

To  see  how  statecharts  work,  consider  the  problem  of  designing  a  digital  watch. 
We'll  substantially  simplify  the  problem,  which  is  described  in  much  more  realistic 
detail  in  |Harel  1987],  Statecharts  can  be  used  to  construct  designs  either  top-down  or 
bollom-up.  We'll  sketch  a  top-down  approach.  At  the  top  level,  our  watch  has  three 
states,  shown  in  Figure  H.2. 

When  the  watch  is  turned  on.  it  enters  the  displaying  state,  in  which  it  displays  the 
dale  and  time.  If  an  alarm  is  triggered,  it  will  enter  the  alarm-beeping  state,  where  it 
will  slay  until  some  button  (any  button)  is  pushed.  When  that  happens,  it  will  return  to 
the  displaying  stale.  IT  the  set  button  is  pushed  from  the  displaying  state,  the  watch 
enters  the  setting  slate,  in  which  the  date,  time,  and  alarms  can  be  set.  When  the  done 
button  is  pushed,  it  returns  to  the  displaying  state.  If  an  alarm  is  triggered  while  the 
watch  is  in  the  setting  state,  it  will  immediately  enter  the  alarm-beeping  state. The  only 
way  to  return  to  the  setting  state  is  to  enter  it  in  the  usual  way.  So  the  watch  will  forget 
any  settings  that  have  not  yet  been  saved.  If  the  set  button  is  pushed  while  an  alarm  is 
beeping,  it  will  be  ignored  (since  there  is  no  transition  labeled  set  button  pushed  from 
the  alarm-beeping  state). 


\ 


i my  button  pushed 

\ - > 

set  button  pushed 

r - "N 

aluritt-lwi""* 

- - - 

displaying 

setting 

ahum  trigger 

< _ _ 

done  button  pushed 

alarm  trigger 


7 


^GURE  H  2 


A  simple,  top-level  model  of  a  watch. 
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FIGURE  H.3  Zooming  in  to  the  alarm-beeping  state. 


One  might  easily  take  issue  with  several  things  about  the  design  that  we  have  just 
dcscrihcd.  For  example,  perhaps,  if  the  set  button  is  pushed  from  the  alarm-beeping 
slate,  the  watch  should  go  directly  to  the  setting  state.  The  point  of  this  example  is  not 
to  argue  for  this  particular  design.  It  is  to  show  the  way  that  a  slatechart  makes  clear 
what  a  particular  design  is  and  what  decisions  were  made  in  constructing  it. 

Of  course,  to  build  a  watch,  we  need  a  more  detailed  design  than  the  one  we  have 
just  presented.  To  provide  it.  we  must  zoom  in  to  each  of  the  top-level  states.  Zooming 
into  the  alarm-beeping  state,  we  might  see  the  statechart  shown  in  Figure  H.3. 

Now  we  see  that  this  watch  has  two  separate  alarms,  either  of  which  may  be  set. 
We've  used  the  symbol  ii\  to  mean  that  the  first  alarm  has  been  triggered  and  Oi  to 
mean  that  the  second  one  has.  Notice  the  way  in  which  we  used  the  statechart's  hierar¬ 
chical  structure  mechanism  to  reduce  the  number  of  transitions,  compared  to  the  num¬ 
ber  that  would  be  required  in  a  flat  machine:  The  transition  from  the  alarm-beeping 
state  back  to  the  displaying  state  does  not  need  to  be  duplicated  for  both  of  the  alarm 
substates.  Instead  it  is  attached  once  to  the  parent  state.  Transitions  are  assumed  to  be 
inherited,  unless  overwritten,  from  parent  state  to  child  slates. 

Now  suppose  that  the  watch  has  a  background  light,  which  can  be  in  one  of  two 
states,  on  or  off.  By  default,  it  will  be  off.  but  it  will  go  on  if  either  the  light  button  or 
the  set  button  is  pressed  (the  latter  on  the  assumption  that  the  user  may  need  the  light 
in  order  to  see  well  enough  to  perform  any  settings).  So  now.  at  any  point,  the  watch  is 
in  some  state  within  its  main  controller,  as  sketched  in  Figure  H.2,  and  it  is  in  one  of 
the  light  slates.  We  could  model  that  by  creating  a  second  copy  of  the  main  control 
box.  with  one  copy  corresponding  to  the  light  being  on  and  the  other  corresponding  to 
it  being  off.  But  that  doubles  the  number  of  slates.  Instead,  we  can  exploit  the  ability 
of  stalecharts  to  represent  (nearly)  orthogonal  state  sets.  Orthogonal  sets  will  be  sep¬ 
arated  by  a  dashed  line.  If  a  model  contains  orthogonal  stale  sets,  it  should  be  read  to 
say  that,  at  any  point,  the  system  is  in  one  state  from  each  of  those  sets. 

The  new  model,  shown  in  Figure  H.4.  has  five  states. That’s  only  one  fewer  than  we 
would  have  had  if  we  had  enumerated  all  the  combinations.  But  imagine  a  more  realis¬ 
tic  model,  in  which  each  of  the  components  had  contained  l.(XX)  slates.  Representine 
all  combinations  of  them  would  require  l.OOO.IXX)  states.  Using  orthogonal  state  sets,  as 
we  just  did.  that  model  would  be  able  be  described  with  just  2.000  states.  Note,  thouch 
that  we  have  not  thrown  away  the  ability  to  describe  interactions  between/amone 
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FIGURE  H.4  Describing  ihe  walch  as  two  orthogonal  state  sets. 


orthogonal  states  when  they  occur.  So,  in  our  simple  example,  a  single  event,  set  button 
pushed,  can  cause  a  state  change  in  both  components  of  the  model. 


3  Model-Based  Test  Case  Generation 

We've  now  seen  two  uses  lor  models  that  describe  the  desired  behavior  of  systems  that 
we  wish  to  build; 


•  Models  can  be  used  to  verify  the  correctness  of  an  implementation. 

•  Models  can  be  exploited  as  design  tools. 

We'll  now  briefly  mention  a  third; 


•  Models  can  be  used  to  generate  test  suites. 

'11k*  goal  of  test  case  generation  is  to  construct  a  set  of  tests  that,  collectively,  increase 
our  confidence  that  the  behavior  of  a  system  under  test  conforms  to  some  set  of  desired 
properties.  Suppose  that  we  have  a  model  that  describes  those  properties. Then  it  is  pos¬ 
sible  to  exploit  that  model  in  an  automated  tool  for  generating  test  cases.  A  variety  of 
techniques  can  be  used  to  do  this  P ;  the  details  depend,  among  other  things,  on  the  for¬ 
malism  in  which  the  model  is  written.  We'll  mention  just  one  idea  here. 

Suppose  that  we  have  a  finite  model  of  the  sort  that  can  be  used  as  input  to  a  model 
checker.  If  the  model  checker  discovers  a  counterexample  (i.e.,  an  execution  path  that 
does  not  satisfy  the  specification),  we  know  that  there  is  a  bug  and  can  go  about  fixing 
it.  But  the  model  checker  will  typically  find  many  paths,  called  witnesses,  that  do  satis¬ 
fy  the  specif, cat, on  Unfortunately,  we  are  still  not  certain  that  no  bugs  occur  along 
those  paths  since  he  system  model  is  almost  always  an  abstract  description  that 

SySlCmS  mip,emenlati°n-  U  is  those  details  that  need  to  be 

°  1C)C  i  1 1  r  ics,  i\  vl  °?u  Way  IO  ®enerate  lest  cases  is  to  choose  inputs  that  force  the 
system  under  test  down  the  witness  paths.  v 
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H.4  Reverse  Engineering 

Engineers  start  with  specifications  and  build  artifacts.  Reverse  engineers  start  with 
artifacts  and  attempt  to  reconstruct  specifications  (and.  often,  various  levels  of  struc¬ 
tures  that  implement  those  specifications).  One  can  try  to  reverse  engineer  just  about 
anything.  A  common  quip  is  that  physics  is  an  attempt  to  reverse  engineer  the  universe. 
Molecular  biology  is  an  attempt  to  reverse  engineer  the  genetic  code.  Software  pirates 
reverse  engineer  their  competitors'  code. 

Why  do  reverse  engineering?  Among  the  possible  answers  to  this  question  are. 

•  We  just  want  to  know.  Physicists  probably  relate  to  this  one. 

•  If  we  understood  an  artifact  belter,  we  could  use  it  more  effectively.  Physicists  and 
engineers  relate  to  this  one.  But  also  consider  a  piece  of  software  for  which  no  one 
has  yet  bothered  to  write  a  manual  or  a  piece  of  software  with  undocumented  fea¬ 
tures.  Suppose,  for  example,  that  we  want  a  system  we  are  building  to  share  files 
with  another  system  but  the  group  that  built  the  other  system  never  described  the 
internals  of  their  file  structure.  If  we  could  reverse  engineer  that  other  system,  we 
could  discover  how  its  file  structure  works. 

•  We've  got  an  artifact  that’s  broken  and  we  want  to  understand  its  structure  so  we 
can  fix  it.  This  one  drives  research  in  molecular  biology.  It  also  comes  up  a  lot  with 
legacy  software,  which  can  easily  become  obsolete  even  if  it  doesn't  directly 
“break".  For  example,  the  number  of  bits  allocated  to  some  field  may  no  longer  be 
enough.  But  we  don’t  know  what  the  consequences  would  be  if  we  changed  it. 

•  We've  got  an  artifact  that  is  old  and  clunky.  We  want  to  replace  it  with  a  newer, 
sleeker  version  but  first  we  have  to  figure  out  exactly  what  it  does.  This  one  comes 
up  all  the  time  with  legacy  software. 

•  We  want  to  steal  our  competitors'  ideas. This  is  why  we  have  patent  law. 

We'll  focus  here  on  the  specific  problem  of  reverse  engineering  of  software. The  prob¬ 
lem  is  one  of  analysis  and  the  artifacts  we  need  to  analyze  are  strings.  So  this  seems  like 
a  natural  application  for  many  of  the  ideas  that  we  have  been  discussing.  We'll  briefly 
mention  two  techniques,  one  based  on  extending  our  notion  of  regular  expressions,  the 
other  on  the  use  of  island  grammars. 

H.4.1  Hierarchical  Regular  Expressions 

In  Section  1 1 .9,  in  our  introduction  to  island  grammars,  we  sketched  some  of  the  problems 
that  arise  when  we  try  to  analyze  legacy  software.  In  a  nutshell,  any  approach  that  re¬ 
quires  exact  matching  will  probably  fail  because  of  errors,  the  interleaving  of  different 
languages,  dialect  differences,  and  irrelevant  code,  among  other  things.  Further,  there 
are  applications  that,  by  their  nature,  need  to  be  fast  and  cheap.  For  example,  suppose 
that  we  are  trying  to  analyze  code  in  order  to  figure  out  how  expensive  it  would  be  to 
make  some  proposed  change.  We  require  that  this  feasibility  analysis  be  cheaper  and 
faster  than  the  more  complete  analysis  that  will  probably  be  required  if  we  decide  to  go 
ahead  with  the  update.  What  we  need  is  a  good  technique  for  what  is  often  called 
lightweight  analysis, 

A  robust  lightweight  analysis  tool  needs  to  be  flexible.  It  needs  to  be  able  to  find 
instances  of  patterns  that  describe  the  parts  of  the  code  that  matter.  And  it  needs  to  h 
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able  lo  ignore  ihe  resl.  Regular  expressions  are  very  good  at  this.  For  example,  we 
could  write  the  regular  expression: 

(a  U  b)*  aaabba(a  U  b)*, 

which  will  match  any  occurrence  of  aaabba,  while  skipping  over  everything  else. 

But,  in  a  realistic  software  engineering  environment,  the  regular  expressions  that  we 
would  have  to  write  to  get  the  job  done  become  too  complex  to  work  with.  One 
approach  to  solving  this  problem  {Murphy  and  Notkin  19%)  is  to  organize  the  regular 
expressions  hierarchically.  In  this  approach,  we  still  allow  only  regular  expressions  that 
can  be  compiled  into  finite  state  machines  that  can,  in  turn,  be  used  to  implement  the 
actual  search  process. This  means  that  the  search  process  can  be  fast.  So  the  use  of  reg¬ 
ular  expressions  that  we  are  suggesting  here  contrasts  with  the  use  of  extended  regular 
expression  notations,  for  example  in  Perl  and  in  much  of  the  Unix  world. 

To  see  what  hierarchical  regular  expressions  can  do.  consider  the  following  example 
(from  [Murphy  and  Notkin  19%]): 

[  <type>  ]  <functionName>  \(  [  {<formalArg>  }1  ]  \)  [  {  <type> 
<argDecl>  ;  }1  ]  \{ 

<ca11edFunctionName>  \(  [  {  <parm>  }1  ]  \) 

To  read  these  patterns,  note  two  conventions:  Expressions  enclosed  in  square  brack¬ 
ets  are  optional.  Reserved  tokens,  like  brackets  and  parentheses,  can  be  quoted  using  \. 

The  job  of  these  two  regular  expressions  is  to  extract  static  call  relationships  among 
functions.  The  first  pattern  looks  for  function  definitions.  It  will  match  an  optional  type 
statement,  billowed  by  a  function  name,  an  open  parenthesis,  an  optional  list  of  formal  pa¬ 
rameters,  a  close  parenthesis,  an  optional  list  of  type  declarations  for  the  formal  parame¬ 
ters  (each  terminated  by  a  semicolon),  and  an  opening  curly  brace,  which  marks  the 
beginning  ol  the  function  body. The  names  in  angle  brackets  just  give  names  to  the  tokens 
that  they  match.  Once  these  pieces  have  been  matched,  we  assume  that  a  function  body 
comes  next,  followed  by  a  closing  curly  brace.  We  want  to  find,  within  that  function  body, 
instances  of  calls  to  other  iunctions.  We  don’t  care  about  any  other  code.  So  we  use  the 
second  pattern,  which  is  a  daughter  of  the  first.  Once  the  first  pattern  matches,  the  second 
one  may  also  match.  But,  at  the  same  time,  additional  instances  of  the  first  pattern  will  also 
be  sought.  1 1  isn  t  necessary,  for  example,  to  find  the  matching  right  curly  brace  first.  So  the 
pattern  matching  is  robust,  even  in  the  face  of  mismatched  delimiters  in  the  code. 


2  Island  Grammars 


While  regular  expressions  are  useful,  they  lack,  for  example,  the  ability  to  match  properly 
nested  delimiters.  And  they  don’t  describe  a  way  to  build  anything  like  a  parse  tree  of  the 
code  fragments  that  do  malch.  Context-free  grammars  do  both  of  those  things.  But  pars¬ 
ing  with  context-free  grammars,  as  generally  described  (for  example  in  Chapter  15).  is  not 
fobosl.  Pitrseni  must  find  exact  mulches  between  input  strings  and  grammaA-To  solve  the 

various  P  >  «•  “  are  generally  faced  in  reverse  engineering,  on  the  other  hand,  we 

require  parsers  that  are  robust  in  the  face  of  »ii  .u  •  .  ,  . 

c  ,  ' ..  •  ,  7  1  311  of  ,he  issues  that  wc  mentioned  above. 

So  one  idea  is  to  use  island  grammars  nf  tk»  , ,  n 

.  ,.r  it,,,,  ,i- ._  ,  •  ol  lhe  sort  that  we  described  in  Section  11.9. 

As  part  of  that  discussion,  we  sketched  a  eimru»  ;-i  >  .  ,  , 

(M.mnen  2001 1.  Ils  purpose,  just  like  Ihat  onhe  e,rammar  ''  °ne,  'a 
above,  is  to  find  function  invocations  Bu  k  .  ^expressions  we  presented 

■■  “ht  because  island  or»mmsr«  ar*»  variants  of 
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context-free  grammars,  it  can  find  expressions  with  balanced  delimiters  and  it  can 
build  parse  trees  of  those  expressions. 

Island  grammars  have  proved  useful  H  in  analyzing  both  old  code  (legacy  software 
that  may  be  written  in  obsolete  languages  and  that  has  mutated,  over  the  years,  beyond 
recognition  by  its  original  writers)  and  much  newer  material,  in  particular  Worldwide 
Web  pages,  where  straight  text  is  typically  interleaved  with  code  in  one  or  several  pro¬ 
gramming  and  markup  languages. 

H.5  Normal  Forms  for  Data  and  for  Querying 
Relational  Databases 

In  Section  1 1.8,  we  introduced  the  idea  of  a  normal  form  and  we  mentioned  two  useful  nor¬ 
mal  forms  for  context-free  grammars:  Chomsky  normal  form  and  Greibach  normal  form. 
Throughout  the  rest  of  Part  III  we  used  those  forms  on  several  occasions  to  make  it  easy  to 
define  algorithms  that  operate  on  grammars.  We  also  introduced  restricted  normal  form  for 
PDAs  and  used  it  to  simplify  the  design  of  algorithms  that  operate  on  PDAs  Later,  in 
Section  28.4,  we  introduced  two  normal  forms  for  Boolean  formulas  and  we  exploited 
them  in  our  proof  that  SAT  (the  language  of  satisfiable  Boolean  formulas)  is  NP-complete, 

But  the  idea  of  a  normal  form  as  a  way  to  simplify  the  design  and  implementation  of 
an  algorithm  is  useful  in  a  much  wider  variety  of  contexts  than  those.  For  example,  nor¬ 
mal  forms  are  widely  used  in  the  design  both  of  databases  and  their  interfaces.  In  this 
section,  we  sketch  one  way  in  which  a  normal  form  can  be  used  in  the  design  of  a 
graphical  user  interface  for  relational  databases. 

Programmers  can  write  database  queries  in  programming  languages  such  as  SQL 
(which  we  discuss  briefly  in  Q.  1.1).  But  nonprogrammers  also  use  databases. They  need 
an  interface  tool  that  is  easy  to  use  and  they  are  typically  able  to  gel  by  with  substan¬ 
tially  less  expressive  power  than  languages  like  SQL  offer. 

The  Query  by  Example  (or  QBE)  grid  was  proposed,  in  [Zloof  1975],  as  a  tool  for 
such  users;  it  has  since  been  implemented  in  commercial  relational  database  systems  B. 
The  QBE  idea  is  simple.  Imagine  a  grid  such  as  the  one  shown  in  Figure  H.5(a).The 
column  headings  correspond  to  fields  in  database  tables  or  in  other  queries.  A  user  cre¬ 
ates  a  grid  by  dragging  field  names  into  the  grid;  each  name  creates  a  new  column.  So 
the  grid  we  just  considered  could  have  been  built  by  a  user  of  a  database  that  records  a 
company's  suppliers,  along  with  each  supplier's  products  and  their  prices. 

Once  a  grid  with  all  the  required  fields  has  been  created,  the  user  can  write  a  partic¬ 
ular  query  by  inserting  values  into  the  cells  of  the  grid.  So.  for  example,  one  could  write 
the  simple  query  shown  in  Figure  H.5(b).The  constraints  in  the  nonblank  cells  in  a  row 
of  the  grid  are  ANDed  together  to  form  a  query.  So  this  grid  corresponds  to  the  query 
“Find  all  records  where  Category  is  fruit  and  Supplier  is  Aabco.” 

Disjunctive  queries  can  be  constructed  by  using  more  than  one  row.  The  constraints 
from  multiple  rows  are  ORed  together  to  form  a  complete  query.  So  the  grid  shown  in 
Figure  H.5(c)  corresponds  to  the  query,  “Find  all  records  where  Category  is  fruit  or 
Category  is  vegetable.” 

ANDs  and  ORs  can  be  combined. The  constraints  from  each  row  are  first  ANDed 
together. Then  the  rows  are  ORed  together.  So.  for  example,  consider  the  query; 

(Category=fruit  AND  Supplier* Aabco)  OR  (Category-vegetable  AND 

Suppl i er=Bortrexco) . 
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It  can  be  written  as  the  QBE  grid  shown  in  Figure  H.5(d).  But  now  consider  the  query: 

(Category-fruit  OR  Category-vegetable)  AND  (Supplier-Aabco  OR 
Supplier-Bortrexco) . 

If  we  try  to  write  this  query  directly  in  a  QBE  grid,  we  realize  that,  because  the  QBE 
interpreter  first  ANDs  all  constraints  within  a  row  and  then  ORs  together  all  the  rows, 
every  QBE  query  is  effectively  in  disjunctive  normal  form.  In  other  words,  each  query  is  a 
disjunction  of  subexpressions  each  of  which  is  a  conjunction  of  primitive  constraints.  But, 
to  every  logical  expression,  there  corresponds  an  equivalent  expression  in  disjunctive  nor¬ 
mal  form.  (We  proved  this  claim  for  Boolean  logic  as  Theorem  B.3.  The  corresponding 
claim  for  first-order  logic  can  be  proved  similarly.)  So  we  can  rewrite  our  query  as: 

(Category-fruit  AND  Supplier-Aabco)  OR 
(Category-fruit  AND  Supplier-Bortrexco)  OR 
(Category-vegetable  AND  Supplier-Aabco)  OR 
(Category-vegetable  AND  Supplier-Bortrexco) . 

From  this  form,  it  can  easily  be  written  as  the  QBE  grid  shown  in  Figure  H.5(e). 

The  QBE  grid  is  a  simple  structure  and  it  is  easy  for  people  to  leam  to  use.  It  is  more 
expressively  powerful  than  its  obvious  structure  predicts  because  disjunctive  normal  form 
is,  just  as  its  name  suggests,  a  normal  form.  In  other  words,  while  not  all  logical  expressions 
are  in  that  form,  all  of  them  can  be  converted  into  an  equivalent  expression  that  is. 
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FIGURE  Hi!  Representing 
queries  in  QBE  grids. 
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Applications:  Networks 


The  theory  that  we  have  described  in  this  book  is  useful  in  describing,  at  many  levels, 
the  structure  of  networks  and  the  way  they  can  be  used.  We’ll  introduce  a  few  of  them: 

•  the  definition  of  network  communication  protocols, 

•  monitoring  and  maintaining  networks, 

•  exploiting  network  resources  for  problem  solving— the  Semantic  Web.  and 

•  network  security  (or  the  lack  of  it)— hackers  and  viruses. 

We’ll  discuss  the  first  three  of  these  issues  in  this  chapter.  We'll  talk  about  the  re¬ 
maining  one  in  the  next  chapter,  in  the  larger  context  of  computer  security. 


1.1  Network  Protocols 

The  job  of  a  network  is  to  enable  efficient  and  reliable  communication  between  hosts. 
To  make  any  kind  of  physical  network  suitable  for  practical  use  requires  solving  all  of 
the  following  problems: 

•  error  control:  Messages  may  be  corrupted  or  lost  or  reordered  as  they  are  being  sent. 

•  flow  control:  The  receiving  host  may  not  be  able  to  process  messages  as  fast  as  the 
sending  host  can  send  them. 

•  bandwidth  limitation:  The  network  itself  has  a  limit  on  how  fast  data  can  be  trans¬ 
mitted.  If  data  are  sent  faster  than  they  can  he  transmitted,  they  w  ill  be  lost.  So  it  is 
particularly  important  that  the  network  never  be  forced  to  sit  idle  since  idle  time 
throws  away  capacity. 

To  solve  these  problems  requires  the  definition  of  one  or  more  communication 
protocols ,  i.e.,  shared  conventions  for  the  transmission  of  data  in  one  direction  and  ac¬ 
knowledgements  (that  the  data  have  been  received)  in  the  other  direction.  Rather  than 
attempting  to  describe  all  of  the  required  functionality  as  a  single  protocol,  it  is  now 
common  practice  to  provide  a  protocol  stack.  Protocols  at  layer  n  of  the  stark  make 
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The  Layers  and  Their  Responsibilities 

Application  layer: 

Supports  applications  such  as  the  World  Wide  Web 
and  email. 

Transport  layer: 

Transmits  complete  messages  between 
application  clients  and  servers. 

Network  layer: 

Relays  messages  through  a  series  of  switches  (routers) 
From  source  to  destination. 

Link  layer: 

Transmits  messages  from  one  node  to  the  next. 


Physical  layer: 

Transmits  bits  across  a  physical  network. 


Example  Protocols 
HTTP,  SMTP.  FTP 

TCP.  UDP 

IP 

Ethernet 

Copper  wire,  coaxial  cable, 
radio  frequency 


FIGURE  1.1  The  Internet  protocol  slack. 


use  of  the  functionality  provided  at  layers  n- 1  and  below.  For  example,  the  Internet 
protocol  stack  has  five  layers,  as  shown  in  Figure  1.1. 

Many  kinds  of  communication  protocols  can  usefully  be  modeled  as  communicating 
finite  state  machines  that  never  halt.  Each  process  (machine)  simply  loops  forever, 
sending  and  receiving  messages.  So,  more  precisely,  the  models  that  we  are  about  to 
build  are  Biichi  automata,  as  described  in  Section  5.12,  but  without  the  distinction  be¬ 
tween  accepting  and  nonaccepling  stales.  In  the  rest  of  this  section,  we  will  show  au¬ 
tomata  that  correspond  to  the  explicit  communication  actions  that  are  required  by  a 
few  important  network  communication  protocols.  Note  that,  in  all  of  these  models,  the 
finite  state  automata  will  capture  just  the  communication  state  of  the  corresponding 
processes.  Additional  slate  is  required  to  encode  the  data  that  are  being  transmitted. 

The  most  basic  protocol  we  can  imagine  is  illustrated  in  Figure  1.2.  The  horizontal 
axis  corresponds  to  time.  The  sender  simply  sends  data  messages  (indicated  by  the 
boxes  labeled  D)  whenever  it  is  ready  to  do  so.  The  hope  is  that  the  receiver  receives 
them  (at  some  lime  after  they  are  sent). 


FIGURE  L2  A  very  simple  protocol. 
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Sender : 


Heceiver : 


FIGURE  Sender  and  receiver  models  for  the  simple  protocol. 


Finite  state  models  of  the  sender  and  the  receiver  using  this  protocol  are  very  simple. 

In  constructing  these  models,  we  assume  that  there  is  a  higher  level  process  on  one  side 
that  invokes  the  sender  when  it  has  data  to  send  and  a  higher  level  process  on  the  other 
side  that  will  be  notified  by  the  receiver  whenever  data  have  arrived.  Although  we  won’t 
handle  this  part  of  the  process  in  the  models  we  are  about  to  build,  we  can  note  that  the 
sender  will  maintain  a  FIFO  (first-in.  first -out)  queue  of  messages  that  it  has  been  told  to 
send.  Each  time  it  is  ready,  it  will  remove  and  send  the  message  at  the  head  of  that  queue. 

In  writing  our  finite  state  models,  we  will  use  (S)  to  correspond  to  the  sending  of  a 
message  and  (R)  to  correspond  to  the  receiving  of  one.  We'll  use  D  to  correspond  to  a 
message  containing  data.  (In  other  protocols  that  we  are  about  to  describe,  there  will 
be  other  kinds  of  messages  as  well.)  So  we  have  the  sender  and  receiver  models  shown 
in  Figure  1.3. The  sender  wails  until  it  is  given  data  to  send.  At  that  point,  it  changes 
state,  sends  a  data  message,  and  then  returns  to  the  waiting  slate.  The  receiver  waits 
until  data  arrive.  When  that  happens,  it  moves  to  the  active  stale,  in  which  it  delivers 
the  data.  When  it  finishes,  it  returns  to  the  waiting  state. 

This  simple  protocol  is  efficient.  The  sender  is  free  to  exploit  all  the  bandwidth  of 
the  network  by  continuously  sending  data.  But  this  protocol  fails  to  address  either  of 
the  first  two  concerns  we  mentioned  above. 

•  If  a  message  is*corruptcd  or  if  it  simply  fails  to  arrive,  there  is  no  mechanism  by 
which  the  sender  can  discover  the  problem  and  retransmit. 

•  If  the  sender  is  sending  messages  faster  than  the  receiver  can  retrieve  them,  process 
them,  and  clear  its  buffers,  then  data  will  be  lost.  In  this  case,  the  sender  needs  to  slow 
down.  But,  again,  there  is  no  mechanism  for  telling  the  sender  that  there  is  a  problem. 


1.1.1  Stop-and-Wait 

A  family  of  protocols  called  Automatic  Repeat  reQuest  (or  ARQ )  protocols  have  been 
designed  to  solve  the  two  problems  that  we  have  just  described.  In  an  ARQ  protocol 
the  receiver  communicates  back  to  the  sender  and  the  sender  exploits  that  communi¬ 
cation  to  help  it  determine  when  a  failure  has  occurred  and  a  message  should  be  re¬ 
transmitted.  A  simple  subfamily  of  ARQ  protocols  is  called  Stop-and-Wait. 
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A  very  basic  Slop-and-Wait  protocol  is  illustrated  in  Figure  1.4.  Now  there  are 
two  kinds  of  messages— data  messages  (labeled  D)  and  acknowledgement  messages 
(labeled  ACK).  The  sender  waits  for  an  acknowledgement  that  one  message  has 
been  received  before  sending  the  next  one. 

With  just  one  additional  slate  in  each  model,  we  can  describe  the  behavior  of  senders 
and  receivers  using  this  new  protocol,  as  shown  in  Figure  1.5. 

The  Slop-and-Wait  protocol  that  we  have  just  described  solves  the  flow  control 
problem  that  existed  for  the  simpler  case.  The  sender  will  never  send  a  second  message 
until  it  knows  that  the  receiver  has  successfully  delivered  the  data  from  the  first  one. 
And  it  solves  one  error  control  problem:  If  a  data  message  is  lost,  the  sender  will  re¬ 
transmit  it  after  it  times  out  waiting  for  an  ACK.  If  a  data  message  is  corrupted  and  the 
receiver  can  tell  that  (for  example  by  using  checksum  bits),  the  receiver  can  simply  fail 
to  send  an  ACK,  the  sender  will  time  out,  and  the  message  will  be  resent.There  are  also 
variants  of  the  Slop-and-Wait  protocol  in  which  explicit  negative  ACK  messages 
(NACKs)  are  sent  when  a  message  arrives  corrupted. 


Sender :  Receiver : 


FIGURE  1.1  Sender  and  receiver  models  tor  the  Stop-and-Wait  protocol. 


922  Appendix  I  Applications:  Networks 


FIGURE  1.6  What  happens  when  a  delayed  message  eventually  shows  up  and  its  ACK  is 
confused  with  another. 

But  other  error  control  prohlems  remain: 

•  If  a  data  message  arrives  successfully  but  its  corresponding  ACK  message  gets  lost, 
the  sender  will  time  out  and  then  resend  the  data  message.  But  then  the  receiver 
will  receive  two  copies  of  the  same  message.  It  has  no  way  to  know  that  it  has  just 
gotten  a  second  copy  of  the  first  message  rather  than  a  first  copy  of  a  next  message. 

•  Suppose  that  the  sequence  of  events  shown  in  Figure  I  .ft  occurs.  The  first  data 
message  (labeled  D„)  is  delayed  until  after  the  sender  times  out  waiting  for  it  to 
be  acknowledged.  So  that  first  message  will  be  resent.  It  arrives  and  an  acknowl¬ 
edgement  is  sent  and  received.  So  the  sender  sends  a  second  data  message  (labeled 
D  | ).  It  gets  lost.  But.  meanwhile,  the  original  copy  of  the  first  message  arrives  and  is 
acknowledged.  The  subscripts  in  the  figure  are  just  to  enable  us  to  envision  the 
events. There  are  no  subscripts  attached  to  any  of  the  messages.  So  the  sender,  when 
it  gels  a  second  ACK,  thinks  that  its  second  message  was  received.  It  goes  on  to 
send  the  third  one. 

1.1.2  Alternating  Bit  Protocol 

Notice  that  if  subscripts,  of  the  sort  we  used  in  the  last  example,  were  actually  present 
in  data  messages  and  in  ACKs,  we  could  solve  both  of  the  Wail-and-Scc  protocol  prob¬ 
lems  that  we  just  described.  The  next  protocol  that  we  ll  describe,  the  Alternating  Bit 
protocol  u„  doesn't  add  arbitrary  subscripts.  It  does,  however,  add  a  single  control  bit 
to  each  data  message  and  to  each  ACK.  The  hit  will  alternate  values  with  each  trans¬ 
mission.  By  convention,  the  receiver  will  flip  the  bit  before  sending  an  ACK,  so  the 
message  that  acknowledges  the  receipt  of  D„  will  be  ACK,  (indicating  that  the 
sender's  next  data  message  should  be  D,)  and  vice  versa.  Figure  1.7  (a)  illustrates  the 
straightforward  case  of  the  Alternating  Bit  protocol. 

A  troublesome  case,  like  the  one  we  showed  in  Figure  1.6.  will  he  handled  by  the  Al¬ 
ternating  Bit  protocol  as  shown  in  Figure  1.7  (b).The  second  ACK,  is  simply  discarded 
as  redundant  since  the  sender  already  knows  that  D„  was  received. The  sender  will  time¬ 
out  waiting  to  get  an  ACK„  (acknowledging  receipt  of  D,).  So  it  will  (correctly)  resend 
D|.  Tnis  same  mechanism  makes  it  possible  for  the  receiver  to  tell  when  duplicate  mes¬ 
sages  are  received.  Whenever  this  happens,  the  second  copy  will  simply  be  discarded 
To  describe  the  behavior  of  senders  and  receivers  that  exploit  this  new  protocol  re¬ 
quires  two  copies  of  each  of  the  states  that  were  needed  for  the  Stop-and-Wait  prot 
col.  One  copy  corresponds  to  handling  data  whose  control  bit  is  0;  the  other  to  handli  °" 
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data  whose  control  bit  is  l.The  new  models  are  shown  in  Figure  1.8.  Both  the  sender's 
and  receiver's  start  state  is  waitingo. 

The  Alternating  Bit  protocol  does  not  solve  all  error  control  problems.  For  example  it 
does  not  address  the  problem  of  messages  that  are  received  but  corrupted.  It  also  won’t 
work  if  a  message  is  delayed  long  enough  that  its  parity  matches  that  of  a  more  recently 
transmitted  message.  But  its  most  serious  problem,  as  a  practical  protocol,  is  throughput. 
As  with  any  Wail-and-See  protocol,  the  network  must  sit  idle  while  the  sender  waits  for 
the  receiver  to  process  a  message  and  for  an  ACK  to  be  sent  and  received. 


3 


Sliding  Window  Protocol 


The  wasted  bandwidth  problem  can  be  solved  by  a  more  sophisticated  ARQ  tech¬ 
nique,  the  Sliding  Window  protocol ,  that  assigns  sequence  numbers  (rather  than  alter¬ 
nating  bits)  to  the  messages  that  a  sender  sends  to  a  receiver.  As  before,  data  messages 
are  initially  entered  into  a  FIFO  queue  that  is  maintained  by  the  sender. They  are  as¬ 
signed  sequence  numbers  as  they  enter  the  queue  and  they  will  be  transmitted  in  the 
order  in  which  they  enter  the  queue. 


Any  spccilic  use  of  the  Sliding  Window  protocol  begins  by  choosing  a  window  siz< 
in.  The  window  is  then  placed  over  the  first  tv  messages.  We'll  say  that  those  message 

are  in  the  send  window  The  sender  may  send  (without  waiting  for  acknowledgements 
icssage  that  is  in  the  send  window.  Th<»  ...: _ > _ •  _ i _ 


may  send  (without  waiting  for  acknowledgements 
any  message  that  is  in  the  send  window. The  send  window  can.  in  torn,  be  shifted  to  th 
right  as  AC  K  messages  are  received. 

^  £r°!Tl  'S  USed  for  ser*ding  messages  on  the  Internet.  It  is 

lustra  e  i  g  c  .  .  ac  ox  corresponds  to  a  data  message  to  be  sent  from  tl 

se."  .  h  ,  h  -ageS  lhal  ^3Ve  ^een  sent  are  shown  in  lhe  sender’s  quei 

wlth  diagonal  hatch  lines.  Messages  that  have  been  received  are  shown  with  hatch. 
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Sender:  Receiver: 


FIGURE  1.8  Sender  and  receiver  models  for  the  Alternating  Bit  protocol. 
The  sender's  queue: 

L*- - Send  window 


Sending 


The  receiver's  queue s 


Receive  window 


Next  expected  Received  (out  of  order) 
FIGURE  1.9  The  Sliding  Window  protocol. 
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lines  in  the  receiver’s  queue. The  sender  begins  transmitting  by  sending,  in  order,  the 
messages  in  the  send  window  (i.e.,  the  first  w  messages).  It  will  not  wait  for  an  ac¬ 
knowledgement  of  message  n  before  sending  message  n  +  1.  It  will,  however,  expect 
to  be  told  of  the  arrival  of  all  data  messages  it  sends  and  it  will  resend  any  message  on 
which  it  times  out  before  it  receives  an  ACK  that  acknowledges  that  message. 

In  any  cumulative  acknowledgement  protocol,  of  which  the  Sliding  Window  proto¬ 
col  is  one  example,  an  ACK„  message  acknowledges  receipt  of  all  data  messages  num¬ 
bered  up  to  n  -  1.  So.  as  shown  in  the  diagram,  the  receiver  may  have  received  some 
messages  that  have  not  been  acknowledged;  they  won't  be  until  all  the  messages  before 
them  in  the  sequence  have  been  successfully  received.This  means  that,  if  the  sender  re¬ 
ceives  an  ACK„  message,  then  it  knows  that  all  messages  numbered  up  to  n  -  1  have 
been  received.  At  that  point,  the  send  window  can  be  slid  to  the  right  so  that  the  lowest 
numbered  message  it  contains  is  n.  Each  time  the  window  slides,  the  sender  may  re¬ 
sume  sending  messages.  It  need  only  stop  and  wait  when  it  has  sent  all  messages  in  the 
current  send  window.  For  a  more  formal  treatment  of  cumulative  acknowledgement 
protocols,  including  Sliding  Window,  see  [Gouda  1998], 

The  Sliding  Window  protocol  cannot  usefully  be  modeled  as  a  finite  state  machine 
because  of  the  need  to  store  the  message  sequence  numbers.  (Of  course,  if  we  assume 
a  maximum  word  size  for  storing  those  numbers,  we  could  build  a  corresponding  FSM, 
but  it  would  be  too  complex  to  be  useful  as  an  analysis  tool.)  As  well  see  in  the  next 
section  though,  we  can  continue  to  use  Finite  state  machines  as  tools  for  describing 
higher-level  protocols,  including  ones  that  are  used  on  the  Internet  and  that  exploit  the 
Sliding  Window  protocol.  We'll  simply  take  advantage  of  the  fundamental  structure  of 
a  protocol  stack  and  treat  the  action  of  correctly  sending  a  message  as  an  atomic  event 
without  worrying  about  how  it  happens. 


.4  TCP 


In  the  Internet  protocol  stack,  the  transport  layer  sits  immediately  below  the  appli¬ 
cation  layer.  So  the  transport  layer  protocol  is  invoked  by  application  protocols 
such  as  HTTP,  SMTP,  and  FTP.  A  practical  transport  layer  protocol  must  be  efficient 
and  it  must  address  some  issues  that  we  have  not  considered  up  until  this  point.  For 
example,  it  must  enable  data  messages  to  be  sent  in  both  directions  between  two 
hosts. 

The  transport  layer  protocol  used  by  the  Internet  is  the  Transmission  Control  Pro¬ 
tocol  (or  TCP).  A  TCP  connection  is  established  by  a  three-step  handshake  procedure: 
One  host  initiates  the  connection,  the  other  acknowledges  it,  and  the  originator  then 
confirms  the  acknowledgement.  Once  a  connection  is  open,  data  can  be  transmitted 
between  the  two  hosts  until  it  is  closed. 

Internet  standards  are  deFmed  by  a  set  of  documents  called  RFCs.The  functional 
specification  of  TCP  can  be  described  as  a  simple  finite  state  transducer,  shown  in 
Figure  1. 10,  exactly  as  it  appears  in  [RFC  793]  B  as  Figure  6 

The  model  is  described,  again  in  [RFC  793],  as  follows: 


A  i  kt™  PQv^r\SceK,ihrc0^gh  3  SCnes  of  states  during its  lifetime.  The  states 

pin  wait?'  rTrvfif ^iSJ^*RECElVED'  ESTABLISHED,  FIN-WAIT-i, 
F1N-WA  T-2,  CLOSE-WAIT,  CLOSING,  LAST-ACK,  TIME-WAIT,  and  the 
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FIGURE  1.10  A  finite  state  transducer  model  ofTCP. 


fictional  state  CLOSED.  CLOSED  is  fictional  because  it  represents  the  state 
when  there  is  no  TCB  (Transmision  Control  Block],  and  therefore,  no  connec¬ 
tion.  Briefly  the  meanings  of  the  states  are: 

•  LISTEN  —  represents  waiting  for  a  connection  request  from  any  remote  TCP 
and  port. 

•  SYN-SENT  —  represents  waiting  for  a  matching  connection  request  after  hav¬ 
ing  sent  a  connection  request. 


1.2  Modeling  Networks  as  Graphs  927 


•  SYN-RECEIVED  —  represents  waiting  for  a  confirming  connection  request  ac¬ 
knowledgment  after  having  both  received  and  sent  a  connection  request. 

•  ESTABLISHED  —  represents  an  open  connection,  data  received  can  be  deliv¬ 
ered  to  the  user.  The  normal  state  for  the  data  transfer  phase  of  the  connection. 

•  FIN-WAIT-1  —  represents  waiting  for  a  connection  termination  request  from 
the  remote  TCP,  or  an  acknowledgment  of  the  connection  termination  request 
previously  sent. 

•  FIN -WAIT-2  —  represents  waiting  for  a  connection  termination  request  from 
the  remote  TCP. 

•  CLOSE- WAIT  —  represents  waiting  for  a  connection  termination  request  from 
the  local  user. 

•  CLOSING  —  represents  waiting  for  a  connection  termination  request  acknowl¬ 
edgment  from  the  remote  TCP. 

•  LAST-ACK  —  represents  waiting  for  an  acknowledgment  of  the  connection  ter¬ 
mination  request  previously  sent  to  the  remote  TCP  (which  includes  an  ac¬ 
knowledgment  of  its  connection  termination  request). 

•  TIME- WAIT  —  represents  waiting  for  enough  time  to  pass  to  be  sure  the  re¬ 
mote  TCP  received  the  acknowledgment  of  its  connection  termination  request. 

•  CLOSED  —  represents  no  connection  state  at  all. 

A  TCP  connection  progresses  from  one  state  to  another  in  response  to  events.  The 
events  are  the  user  calls,  OPEN,  SEND,  RECEIVE,  CLOSE,  ABORT,  and  STATUS; 
the  incoming  segments,  particularly  those  containing  the  SYN,  ACK,  RST  and  FIN 
flags:  and  timeouts. 

Each  transition  in  this  diagram  has  a  label  of  the  form  <event>  ,  where  <event>  is 

<action> 

the  event  that  causes  the  transition  to  occur  and  <action>  is  the  action  that  is  execut¬ 
ed  when  the  transition  is  taken.The  diagram  ignores  error  conditions  and  other  actions 
that  are  not  directly  connected  to  the  state  changes. 


Modeling  Networks  as  Graphs 

It  is  natural  to  model  a  network  as  a  graph  in  which  the  processors  correspond  to  ver¬ 
tices  and  the  links  correspond  to  edges.  As  soon  as  we  do  that,  it  becomes  clear  that 
many  of  the  problems  that  we  need  to  solve  when  we  build  and  analyze  networks 
correspond  to  the  graph  problems  that  we  discussed  in  Part  V.  We’ll  mention  a  few 
examples  here. 

Consider  the  problem  of  designing  a  physical  network  that  connects  a  set  of  points. 
We  want  to  find  the  cheapest  way  to  build  the  network.  We  can  show  that  there  is  an 
efficient  algorit  m  for  solving  this  problem.  Let  G  be  a  graph  whose  vertices  corre¬ 
spond  to  the  points  and  whose  edges  correspond  to  the  costs  of  laying  cable  (or  wires, 
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or  whatever)  between  pairs  of  points.  Recall  that  a  spanning  tree  7  of  G  is  a  subset  of 
the  edges  of  G  such  that: 

•  7  contains  no  cycles,  and 

•  every  vertex  in  G  is  connected  to  every  other  vertex  using  just  the  edges  in  7. 

If  G  is  a  weighted  graph,  then  the  cost  of  a  spanning  tree  is  the  sum  of  the  costs 
(weights)  of  its  edges.  Define  a  tree  7  to  be  a  minimum  spanning  tree  of  G  iff  it  is  a 
spanning  tree  and  there  is  no  other  spanning  tree  whose  cost  is  lower  than  that  of  7.  In 
Section  28.1.6.  we  described  the  minimum  spanning  tree  problem  as  the  language 
MST  =  { <G,  cost>  :  G  is  an  undirected  graph  with  a  positive  cost  attached  to  each  of 
its  edges  and  there  exists  a  minimum  spanning  tree  of  G  with  total  cost  less  than  cost}. 
We  showed  that  MST  is  in  P.  We  described  one  efficient  technique.  Kruskal’s  algo¬ 
rithm,  for  finding  minimum  spanning  trees. 

The  cheapest  way  to  build  a  network  that  connects  the  points  in  G  is  to  lay  cable 
along  a  minimum  spanning  tree  of  G.  So  the  network  design  problem  can  be  reduced 
to  the  minimum  spanning  tree  problem.  Since  we  have  an  efficient  way  to  solve  MST. 
we  have  an  efficient  way  to  design  our  network. 

Next  we  consider  the  problem  of  finding  the  optima!  route  for  a  message  through  a 
network.  Again  we'll  describe  the  network  as  a  graph  G.  Let  the  vertices  of  G  corre¬ 
spond  to  network  nodes  and  let  the  edges  correspond  to  network  links.  We  can  reduce 
the  message  routing  problem  to  the  problem  of  finding  the  shortest  path,  from  source 
to  destination,  through  G.  In  Section  28.7.4,  we  described  the  language: 

•  SHORTEST-PATH  =  {<G,  u,  v,k>:G  is  an  unweighted,  undirected  graph,  u, 
and  v  are  vertices  in  G,  k  ^  0,  and  there  exists  a  path  from  u  to  t;  whose  length  is 
at  most  *}. 

We  showed  that  SHORTEST-PATH  is  in  P.  Unfortunately,  SHORTEST-PATH  is 
not  exactly  what  we  need  to  solve  the  message  routing  problem  because  it  is  stated  in 
terms  of  unweighted,  rather  than  weighted,  graphs.  We  need  to  use  weights  to  describe 
the  costs  of  the  individual  network  links.  But,  as  we  mentioned  in  Section  28.7.4,  there 
also  exist  efficient  algorithms  for  finding  paths  through  weighted  graphs. 

Next  we  consider  the  problem  of  checking  a  network  to  verify  that  all  links  are 
working  properly.  The  shortest  way  to  traverse  all  the  links  in  a  network  is  via  an 
Eulerian  circuit.  Recall  that  an  Eulerian  circuit  through  a  graph  G  is  a  path  that  starts 
at  sonic  vertex  s,  ends  back  in  .r,  and  traverses  each  edge  in  G  exactly  once.  In  Section 
28.1.5  we  described  the  problem  of  finding  an  Eulerian  circuit  as  the  language: 

•  EULERIAN-CIRCUIT  =  {<G>  :G  is  an  undirected  graph  and  G  contains  an 
Eulerian  circuit}. 

We  showed  that  EULERIAN-CIRCUIT  is  in  P.  So  there  exists  an  efficient  algo¬ 
rithm  for  solving  the  link  checking  problem. 

So  far.  our  theory  has  yielded  positive  results  for  the  network  problems  we  have 
wished  to  solve.  Unfortunately,  that  is  not  always  so.  as  we'll  see  in  our  final  example 
Consider  the  problem  of  finding  a  minimal  set  of  network  nodes  at  which  we  can 
place  monitors  so  that  we  can  observe  the  status  of  every  network  link.  Again,  we’ll 
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describe  the  network  as  a  graph  G,  whose  vertices  correspond  to  network  nodes  and 
whose  edges  correspond  to  network  links.  Recall  that  a  vertex  cover  C  of  a  graph  G 
with  vertices  v  and  edges  £  is  a  subset  of  v  with  the  properly  that  every  edge  in  E 
touches  at  least  one  of  the  vertices  in  C.  We  can  reduce  the  problem  of  finding  a  min¬ 
imal  set  of  monitor  sites  to  the  problem  of  finding  a  smallest  vertex  cover  of  G.  In 
Section  28.h.5,  we  described  the  vertex  cover  problem  as  the  language: 

•  VERTEX-COVER  =  { <G.  k>  :G  is  an  undirected  graph  and  there  exists  a  ver¬ 
tex  cover  of  G  that  contains  at  most  k  vertices}. 

Unfortunately,  we  showed  that  VERTEX-COVER  is  NP-complcte.  So  it  is  unlikely 
that  there  exists  an  efficient  algorithm  for  solving  it. 

Exploiting  Knowledge:  The  Semantic  Web 

Networks  enable  two  or  more  computers,  and  their  users,  to  communicate.  The  World 
Wide  Web  enables  millions  (possibly  billions)  of  computers,  and  their  users,  to  commu¬ 
nicate.  Hard  problems  get  solved  by  building  software  layers  on  top  of  the  fundamen¬ 
tal  communication  protocols  that  we  have  already  described. 

Hypertext  structure  turns  a  set  of  documents  into  a  web  of  documents  that  people 
can  explore.  HTML,  which  we  describe  in  Q.1.2.  is  a  standard  hypertext  markup  lan¬ 
guage  that  makes  the  documents  of  the  world  accessible  to  the  people  of  the  world.  But 
what  about  making  information  available  to  the  programs  of  the  world?  It  is  no  longer 
possible  for  people  to  manage  the  amount  of  information  that  is  available  on  the 
World  Wide  Web.  We  need  programs  to  help.  It  is  common  to  describe  such  programs 
as  intelligent  agents. 

We  (people)  can  exploit  the  contents  of  the  Web  because  we  can  read  text,  interpret  ta¬ 
bles,  recognize  images,  and  watch  videos.  In  other  words,  we  assign  meaning  to  Web  objects. 
At  some  point,  it  may  be  possible  to  build  automated  agents  that  can  read  the  current  con¬ 
tents  of  the  Web  in  much  the  same  way  people  do.  In  Appendices  L  and  M  we  describe  just 
a  few  of  the  research  questions  that  must  be  solved  to  make  that  happen.  In  the  meantime, 
if  we  want  automated  agents  to  work  for  us  on  the  Web,  we  must  annotate  the  Web  with 
meanings  that  are  stated  in  machine-usable  forms.  And  then  we  must  provide  a  set  of  infer¬ 
ence  rules  and  procedure(s)  for  exploiting  those  meanings  to  solve  problems. 

As  an  example  of  what  we  might  like  an  agent  to  be  able  to  do,  consider  the  Web 
pages  for  two  local  quilt  guilds,  shown  in  Figure  1.11.  By  reading  these  pages,  people 
would  be  able  to  answer  questions  such  as: 

•  Find  quilt  guilds  in  Texas. The  lacl  that  Hometown  is  in  Texas  can  be  gleaned  from 
the  map  and.  since  Texas  is  the  Lone  Star  State,  Nimble  Fingers  is  in  Texas. 

•  Find  quill  guilds  that  meet  during  the  day. 

•  Find  a  list  of  email  addresses  for  presidents  of  quilt  guilds  in  the  American  southwest. 

•  Find  quill  shows  this  fall  in  Texas, 


Keyword  based  search  engines  aren  t  able  to  discriminate  the  correct  answers  to 
these  questions  rom  many  other  pages  that  happen  to  contain  words  and  phrases  like 
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FIGURE  1.11  Two  quill  guild  Web  pages. 


"quill  guild.”  "quill  show.*’  and  "Texas.”  To  reason  correctly  about  these  questions  re¬ 
quires  having  effective  access  lo  the  contents  of  the  relevant  Web  pages.  It  also  re¬ 
quires  having  background  knowledge  about  what  the  terms  on  the  pages  “mean.”  For 
example,  it's  necessary  to  know  that  10  a.m.  is  "during  the  day”  but  7:30  p.m.  is  not,  that 
Texas  is  in  the  American  Southwest,  and  that  September  is  a  fall  month. 

The  Semantic  Web  a  jBerners-Lec,  Hendler  and  Lassila  2001 1  is  a  vision  for  the 
transformation  of  the  W’orld  Wide  Web  into  a  knowledge  store  (or  knowledge  base) 
that  supports  the  construction  of  agents  that  cun  answer  questions  like  the  ones  we  just 
considered.  To  make  the  Semantic  Web  a  reality  requires  the  solution  of  a  host  of  tech¬ 
nical  problems.  We  focus  here  on  two: 

•  common  description  (markup)  languages.  If  knowledge  is  to  be  used,  its  structure 
and  its  meaning  must  be  described.  If  knowledge  is  to  be  shared,  one  or  more 
slumlord  description  languages  need  to  be  defined. 

•  an  inference  engine.  If  knowledge  is  lo  be  used,  there  must  exist  some  technique(s) 
for  reasoning  with  it  so  that  facts  can  be  combined  to  solve  a  user's  problem. 

Issues  that  we  have  discussed  in  this  book  play  important  roles  in  the  design  of  solu¬ 
tions  to  both  of  these  problems.  In  particular: 

•  to  solve  the  common  description  languages  problem  requires  that  we: 

•  design  one  or  more  languages  that  are  expressively  adequate  for  the  job  yet  re¬ 
tain  the  decidability  and  traclability  properties  that  we  need. 

•  exploit  formal  techniques  for  describing  the  languages  so  that  users  around  the 
world  can  share  them. 

•  lo  solve  the  inference  engine  problem  requires  that  we  design  one  or  more  knowl¬ 
edge  representation  frameworks  that  are  expressively  adequate  but  that  do  not 
crumble  in  the  face  of  the  undecidability  and  intractability  results  that  characterize 
lull  first-order  logic. 

In  the  rest  of  this  section  we’ll  sketch  the  definition  of  a  layered  set  of  languages  that 
can  be  used  lo  define  Web  objects  and  assign  meaning  to  them.  Each  layer  will  be  able 
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to  exploit  the  capabilities  of  the  layers  beneath  it.  One  way  to  think  of  the  languages 
that  we  are  about  to  describe  is  that,  while  most  current  Web  languages  (such  as 
HTML)  are  designed  to  support  a  common  way  of  displaying  content  on  the  Web,  the 
new  languages  must  be  designed  to  support  a  common  way  of  automated  reasoning 
with  that  content. 


The  Language  of  Universal  Resource  Identifiers  (URIs) 

Any  language  for  describing  the  Web  must  start  with  a  sublanguage  for  describing  the 
Web’s  fundamental  units,  as  well  as  other,  non-Web  objects  that  relate  to  Web  objects 
in  useful  ways.  Call  these  resources.  A  Web  page  can  be  uniquely  identified  by  its  Web 
address,  stated  as  a  URL  (universal  resource  locator).  A  URL  contains  an  access 
method  and  an  actual  path  that  will  find  an  object  on  the  Web.  But  what  about  things 
that  aren't  Web  pages?  For  example,  we  might  want  to  be  able  to  say  that  the  creator  of 
a  particular  Web  page  is  a  person  (not  a  web  location)  whose  name  is  Chris,  who  lives 
in  New  York,  and  who  works  for  Jingle  Co.  To  do  this,  we  need  a  way  to  refer  to  Chris, 
New  York,  and  the  Jingle  Co. 

To  make  that  possible,  we’ll  define  a  new  language,  the  language  of  universal  re¬ 
source  identifiers.  A  universal  resource  identifier  (or  URI)  specifics  a  resource.  Some 
URIs  actually  describe  how  to  find  their  associated  resource  on  the  Web.  For  example, 
every  URL  is  also  a  URL  Other  kinds  of  URIs  simply  provide  a  “hook”  that  enables 
statements  to  be  made  about  the  resource,  whether  we  know  how  to  find  it  or  not. 

A  URI  (as  its  name  suggests),  identifies  an  object.  That  object  may  be  a  file  or 
some  other  structure  that  contains  smaller  units.  In  that  case,  we  may  want  to  be 
able  to  refer  to  those  smaller  units  individually.  We  use  the  fragment  notation  to  do 
that.  A  fragment  is  the  symbol  #,  followed  by  a  fragment  name.  So,  for  example,  if 
http:  /  /www.  mystuff.  wow/  products,  htrrfl  contains  descriptions  of  all  my  prod¬ 
ucts,  then  http:  /  /www.  mystuff.  wow/  products,  html  #wi  dget  might  point  directly 
to  the  description  of  my  widgets. 

The  syntax  for  the  language  of  URIs  S  can  be  described  with  a  BNF-style,  con¬ 
text-free  grammar.  (In  fact,  this  language  is  also  regular  and  so  it  could  be  defined 
with  a  regular  grammar  or  a  regular  expression.)  We'll  use  here  the  convention  that 
all  special  characters  are  metacharacters  rather  than  terminal  symbols.  So,  to  use  one 
as  a  terminal  symbol,  it  will  be  quoted.  We’ll  also  use  the  common  convention  that 
any  sequence  that  is  enclosed  by  a  [  ]  pair  is  optional.  Recall  that  |  separates  alterna¬ 
tives,  and  Kleene  star  means  zero  or  more  occurrences  are  allowed.  A  complete 
grammar  for  URIs  on  the  Web  H  is  loo  long  to  present,  but  a  simplified  excerpt  is 
shown  as  follows: 

Rl  *  <~I/ Rlbody>  [“#”< Fragment >] 

<U Rlhody>  —  <Scheme>  <Hier-part>  [“?”<Qm?r>>] 
<Scheme>  —  ftp  |  http  |  https  |  mail  to  |  news  | ... 

<Hier-part>  -*  “//'*  <AuiharityXP at h- Absolut e>\ 

“//”  <  Authority>  <  Path-Em  pty>  | 

<  Path-Absolute >  | 
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<Path~Einpty>\ 

<  Rath- Root  less> 

<Aitilwrity>  —  [  <U\er-info>"<&"]  <  Host  >{":"<  Por l  >  ] 

< Puth-AI)solitte>  -*  “/”  I<Si'*»i«ir-l>(T  <Segmeiu>)*>] 
<Segmenl-  1  >-*<a  segment  with  at  least  one  characlcr> 


EXAMPLE  1.1  Parsing  URIs 

Using  the  grammar  excerpt  that  we  just  presented,  we  can  produce  the  following 
parse  tree: 

<URI> 


1.3.2  RDF 

Now  that  we  have  a  way  to  name  Web  objects,  we  need  a  markup  language  that  can  be 
used  to  describe  their  properties  so  that  they  can  be  exploited.  We'll  call  such  descrip¬ 
tions  metadata,  i.c.,  information  about  other  data  sources  (c.g..  Web  pages). 

There  is  today  no  single  standard  metadata  language  a  for  the  Web.  What  we’ll  do 
here  is  to  describe  one  interconnected  family  of  such  languages.  All  of  them  are  evolving 
standards  maintained  by  the  World  Wide  Web  Consortium  (W3C)  a.  The  bottom  layer 
of  this  language  family  is  RDF  a  (Resource  Description  Framework). 

Each  RDF  statement  is  a  triple  that  asserts  a  value  for  some  property  of  some  Web 
resource.  Remember  that  a  Web  resource  is  anything  that  can  be  named  with  a  URI.so 
it  could  be  a  Web  page  or  something  (like  a  person,  a  city,  a  business  or  a  product)  that 
is  external  to  the  Web  but  to  which  we  can  give  a  name. Taken  together,  a  set  of  RDF 
statements  can  be  used  to  describe  a  set  of  relevant  properties  of  some  useful  collec¬ 
tion  of  Web  resources.  Typically  such  an  RDF  description  will  describe  how  some  col¬ 
lection  of  Web  pages  relate  to  each  other  and  how  they  relate  to  some  collection  of 
external  objects. 

Like  any  language.  RDF  has  a  syntax  and  a  vocabulary.  Both  are  substantially  more 
flexible  than  in  any  of  the  languages  that  we've  considered  so  far. 
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•  The  syntax:  We'll  begin  by  describing  RDF  syntax  abstractly  in  terms  of  triples. The 
meaning  of  an  RDF  expression  is  defined  by  a  semantic  interpretation  function 
that  applies  to  triples.  But  we  also  need  a  “concrete"  syntax,  i.e.,  a  form  for  writing 
strings  that  describe  those  triples.  At  least  two  such  forms  are  in  common  use.  Each 
of  them  comes  with  a  compiler  that  maps  strings  to  triples.  The  Web  community 
doesn't  have  to  agree  on  a  concrete  syntax,  as  long  as  it  agrees  on  the  abstract  one. 

•  The  vocabulary:  It  is  completely  unrealistic  to  assume  that  all  users  of  the  Web  will 
want  to  agree  on  a  single  vocabulary  to  be  used  in  describing  Web  resources  and 
their  properties.  Anyone  who  wants  to  do  so  can  define  an  RDF  vocabulary  and 
place  it  in  a  resource  somewhere.  Then  anyone  who  has  access  to  that  resource  can 
use  one  of  its  terms  by  referring  to  the  resource  and  then  to  some  specific  term. 

Every  RDF  statement  is  a  triple.  So  it  has  three  parts: 

•  a  subject  (the  thing  about  which  a  statement  is  being  made), 

•  a  predicate  (a  property  or  attribute  that  the  subject  possesses),  and 

•  an  object  (the  value  of  the  predicate  for  the  subject). 

The  meaning  of  each  triple  is  that  subject  has  a  property  named  predicate  and  the 
value  of  that  property  is  object.  The  meaning  of  an  RDF  expression  is  the  assertion  of 
the  conjunction  of  all  of  the  triples  that  it  describes.  So  RDF  is  a  logical  language  with 
limited  expressive  power.  In  particular, 

•  all  predicates  are  binary,  (In  other  words,  each  predicate  relates  exactly  two  things: 
its  subject  and  its  object). This  is  not  a  real  limitation,  though,  since  other  predicates 
can  be  converted  into  sets  of  binary  ones. 

•  the  only  logical  connective  is  AND  (since  an  RDF  expression  is  just  a  list  of  triples, 
all  of  which  are  asserted  to  be  true).  Neither  disjunction  nor  negation  is  allowed. 

So,  ignoring  all  issues  of  syntax,  and  even  of  how  entities  get  named,  we  might  use 
RDF  to  specify  triples  like: 


•  (ntywehpage,  created -by,  me). 

•  (me.  lives-in,  cityof Austin ). 

•  ( Hometown  Quilt  Guild,  organization- focus,  quilting). 

At  its  core.  RDF  is  thus  a  very  simple  language.  One  uses  it  to  write  triples. There  is, 
however,  one  way  in  which  its  definition  is  more  complex  than  is  the  definition  of  many 
other  kinds  of  formal  languages.  RDF  is  a  language  for  describing  properties  of  objects 
on  the  Web.  So  there  is  no  central  control  of  what  objects  exist  or  what  they  are  called. 
RDF  must  provide  a  naming  convention  that  handles  the  distributed  and  dynamic  na¬ 
ture  of  things  on  the  Web.  Its  solution  to  this  problem  is  to  use  URls  as  names.  Specifi¬ 
cally,  subjects,  predicates,  and  objects  are  described  as  follows. 


Subjects:  RDF  statements  are  “about"  resources.  So  the  subject  of  every  triple  must 

r.°n  .  within  ”n‘^pXcePl,on;^lere  may  be  “blank"  nodes  that  exist  only  as  place- 
i  °v  «Cn!»  n-iniL*  out  i  Jhese  nodes  are  “blank”  in  the  sense  that  they 

11  C  1  6  e  ,mmediate  expression  in  which  they  occur.  Blank  nodes 
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may  he  the  subject  of  an  RDF  statement.  So,  for  example,  we  may  want  to  say  that 
the  Web  page  w  was  created  by  someone  who  lives  in  Maine  but  we  don't  know 
who  that  person  is.  We  can  create  a  blank  node  1  and  then  say: 

( i<\  creuted-by,  1) 

(_ :  1,  lives-in,  stolen f Maine) 

•  Predicates:  It  is  tempting  to  allow  (as  suggested  by  the  examples  that  we  have  pre¬ 
sented  so  far)  simple  predicate  names  like  lives-in ,  created -by,  works-far ,  etc.  But 
doing  so  would  pose  two  problems: 

•  Where  would  we  define  the  meanings  of  those  strings? 

•  The  (world-wide)  community  of  World  Wide  Web  users  will  never  be  able  to 
agree  on  a  set  of  predicate  names.  It  must  be  possible  for  smaller  communities 
of  users  (including  communities  of  one)  to  define  the  predicates  they  need  with¬ 
out  having  to  worry  about  what  everyone  else  has  done. 

RDF  solves  both  of  these  problems  by  requiring  that  every  predicate  be  a  URI. 
Anyone  who  can  define  a  URI  can  define  an  RDF  predicate.  We'll  say  more  shortly 
about  how  this  system  works. 

•  Objects:  An  object  may  be  a  named  resource  (specified  by  a  URI). a  blank  node. or 
an  element  of  a  primitive  data  type  such  as  siring  or  integer.  We'll  notice,  by  the 
way,  that  strings  get  used  as  objects  much  less  frequently  than  one  might  think.  For 
example,  we  won't  want  to  say  that  Chris  lives  in  “Maine."  Chris  clearly  doesn’t  live 
in  a  string.  We’ll  want  to  say  that  Chris  lives  in  a  slate  whose  name  is  “Maine." 

Since  URIs  play  such  an  important  role  in  RDF.  let's  say  a  bit  more  about  them  be¬ 
fore  we  show  a  few  concrete  RDF  examples. 

Entities  and  predicates  in  RDF  arc  named  by  URIs.  So  they  can  be  defined  (using 
the  RDF  Schema  language  that  we’ll  describe  later)  by  anyone  who  can  create  a  Web 
resource.  Then  they  can  be  used  by  anyone  who  has  access  to  that  resource.  A  single 
Web  file  typically  contains  the  definitions  of  a  collection  of  related  things,  each  of 
which  can  be  uniquely  identified  within  that  file  by  a  fragment  name.  So.  suppose  that 
wc  have  defined  a  set  of  terms  to  be  used  in  describing  craft  organizations. Then  indi¬ 
vidual  predicates  might  have  names  like: 

http:  /  /www.  myisp.  net/  regusers/  my  town/  me/ 
craft-stuff#organizati on-focus 

http:/ /www.  myisp.  net/  regusers/ mytown/ me/ 
craft-stuf f#meeti ng-pl ace 

Next,  we  observe  that  URIs  (like  the  ones  we  just  wrote)  tend  to  be  long.  But  many 
of  the  ones  a  particular  RDF  expression  uses  may  share  a  common  prefix  like: 

http:/  /www.  myisp.  net/  regusers/  mytown/  me/  craft-stuff# 

Users  don’t  want  to  have  to  write  that  whole  string  every  time  they  write  a  subject 
predicate,  or  object  in  an  RDF  expression.  The  solution  to  this  problem  is  the  use  of 
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namespaces  (aslhey  are  defined  in  the  markup  language  XML).To  define  a  namespace, 
we  simply  associate  some  (generally  short,  mnemonic)  name  with  a  string  that  is  a  pre¬ 
fix  of  some  set  of  URIs  that  we  want  to  use.  Using  XML  syntax  (which  is  one  common 
way  in  which  RDF  is  written),  we  can  define  a  namespace  by  writing,  for  example: 

xmlns:crafts=“http:/ /www.  myisp.  net/  regusers/ mytown/ me/ 
craft-stuff#” 

This  XML  namespace  (ns)  definition  maps  the  string  crafts  to  the  long  URI 
shown  to  the  right  of  the  equal  sign.  RDF  then  allows  the  use  of  what  it  calls  qualified 
names  nr  QNAMES  whenever  a  URI  is  required.  A  QNAME  has  the  form: 

< name  space>":"<lacal  name~> 

RDF  will  form  a  full  URI  from  a  QNAME  by  appending  <local  nume>  to  the 
string  to  which  <name  spuce>  maps.  So.  having  defined  the  namespace  crafts  as  we 
just  did,  the  QNAME  crafts :  meeti  ng-pl  ace  is  equivalent  to: 

http:/ /www.  myisp.  net/  regusers/  mytown/ me/ 
craf t-stuff#meeti ng-pl ace 

The  definition  of  the  RDF  language  says  nothing  about  what  vocabularies  can  be 
used  in  RDF  expressions.  Whenever  a  URI  is  required,  any  syntactically  valid  URI  is 
acceptable.  We'll  have  more  to  say  later  about  how  RDF  vocabularies  (of  predicates 
and  things  to  which  predicates  apply)  can  be  defined  (for  example,  how  we  could  have 
defined  the  craft  organization  vocabulary).  For  now,  though,  we’ll  just  mention  that 
there  are  some  public  vocabulary  definitions  □  that  are  commonly  used  in  writing 
RDF  expressions.  Each  of  them  has  a  URI  and  each  of  those  has  a  standard  namespace 
definition.  As  a  shorthand,  we ‘11  use  those  definitions,  but  remember  that  in  real  RDF 
code  you  must  explicitly  define  each  namespace  first. The  namespaces  we’ll  use  are: 

•  rdf  -  contains  terms  that  have  special  meaning  in  RDF. 

•  rdf s  -  contains  terms  that  have  special  meaning  in  RDFS  (RDF  Schema). a  language 
that  we  11  describe  below.  RDFS  is  a  language  for  describing  RDF  vocabularies. 

•  owl  -  contains  terms  that  have  special  meaning  in  the  inference  language  OWL 
that  we’ll  describe  below. 

•  dc  -  contains  a  vocabulary  called  the  Dublin  Core.  Tire  Dublin  Core  vocabulary  has 
been  designed  lor  attaching  common  kinds  of  metadata  to  document-like  network 
objects.  So  it  contains,  for  example,  the  predicates  Title.  Creator,  Subject. 
Publ  i  sher,  Language,  and  so  forth. 

•  foaf  -  contains  a  vocabulary  called  Friend  of  a  Friend  (thus  foaf).The  foaf  vocabu¬ 
lary  contains  terms  that  are  useful  for  describing  people.  So  it  contains,  for  example, 
the  predicates  name,  title,  homepage,  interest,  weblog,  and  school  Homepage. 

So,  for  example,  if  we  write  dc :  Publ  i  sher  as  a  predicate,  it  is  simply  a  shorthand 
for  a  URI  that  happens  to  point  to  a  place  on  a  Web  page  defined  by  the  Dublin  Core 
Initiative.  As  it  turns  out,  on  that  Web  page  is  some  machine-readable  information 
( written  in  the  language  RDF  Schema,  to  be  described  below)  that  can  help  an  agent 
actually  interpret,  m  a  uselul  way,  a  triple  that  uses  this  predicate. 
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We’ve  said  that  we  can  think  of  the  abstract  syntax  of  RDF  as  a  set  of  triples.  But 
what  about  its  “concrete  syntax"?  In  other  words,  what  sequence  of  symbols  must  users 
write  if  they  want  to  define  an  RDF  expression?  The  answer  is  that  users  can  exploit 
any  concrete  syntax  for  which  a  translator  into  abstract  syntax  exists.  One  approach  is 
to  use  the  simple  triple  language  Notation.!  (also  called  N3)  □.  We  illustrate  that  next. 


EXAMPLE  1.2  Writing  RDF  in  N3 


A  very  natural  way  to  write  RDF  is  to  use  the  triple  language  N3.  In  this  example, 
we  will  use  seven  namespaces.  In  N3,  namespaces  are  defined  with  the  ©prefix 
command.  So  we'll  write  those  first,  then  we'll  list  triples. 


@p  ref  i  x  rdf :  <the  location  of  the  rdf  definitions . 

@p  ref i x  dc :  <the  location  of  the  Dublin  Core  definitions . 

@prefi  x  foaf :  <ihe  location  of  the  Friend  of  a  Friend  definitions > . 

@prefi  X  fooddb :  <tlie  location  of  u  fictional  food  description  resmtreo . 

@prefi  x  mystuff :  <ihe  location  of  a  fictional  resource  I've  created.  It  defines  things 

I  care  a  boats. 

@prefi  x  myco:  <the  location  of  a  fictional  resource  that  defines  significant  things  in 
my  company>. 


©prefix  places: 

myco: big report 

myco:person52l 

myco:person521 

myco:person521 

fooddb : chocol ate 

fooddb: chocolate 

myco:person521 


<tlie  location  of  a  fictional  resource  that  defines  places  like  cities 
and  states>. 

dc:Creator 
rdf :type 
foaf :fi rstName 
mystuff: favori tefood 
rdf : type 


myco:person521. 
foaf : Person. 
“Chris". 

fooddb : chocol ate . 
fooddb: food 


fooddb :caloriesperounce  150. 

pi  aces : bi rthpl ace  [pi  aces : ci typl aces : 

Boston; 

pi  aces : statepl aces : 
MA]. 


What  we’ve  said  here  is  that  a  particular  report  called  bi  greport  in  myco  was  cre¬ 
ated  by  someone  whose  identifier  in  myco  is  person521.That  someone  is  a  person  in 
the  sense  defined  in  foaf.  (To  say  this,  we  used  the  RDF-defined  predicate  type, 
which  relates  a  thing  to  a  class  of  which  it  is  a  member.)  Person521's  first  name  is  the 
string  “Chris”.  Chris's  favorite  food  (in  the  sense  in  which  1  defined  it  in  mystuff)  is 
something  that  is  defined  in  fooddb  and  called  chocol  ate  there.  Chocolate’s  calories 
per  ounce  (in  the  sense  defined  in  fooddb)  is  the  number  150.  Finally,  person52l  was 
born  in  an  unnamed  place,  indicated  by  an  unnamed  structure  in  brackets,  whose  city 
is  Boston  and  whose  state  is  M  A  (both  as  defined  in  pi  aces  ). 
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A  common  alternative  to  N3  is  RDF/XML  S,  which  exploits  the  markup  language 
XML  that  we  describe  in  Q.1.2.This  form  is  attractive  to  many  users  because  they  are 
already  familiar  with  XML  and  XML  parsers  are  readily  available. 

Whatever  concrete  syntax  is  used  to  describe  it,  an  RDF  expression  corresponds  to 
a  list  of  triples.  So  another  natural  way  to  think  of  it  is  as  a  labeled,  directed  graph.  The 
vertices  of  the  graph  correspond  to  entities  (on  the  Web  or  in  the  world).  The  edges 
name  properties  that  those  entities  can  possess. 


EXAMPLE  1.3  Representing  RDF  Triples  as  a  Graph 

Here’s  the  graph  that  corresponds  to  the  triples  that  were  defined  in  Example  1.2: 


It  is  perhaps  more  obvious  in  the  graph  representation  than  it  was  in  the  triples 
form  that  whenever  an  RDF  expression  exploits  a  term  that  has  been  defined  in  some 
other  namespace,  it  connects  with  what  is  already  known  in  that  other  namespace.  So, 
for  instance,  we  wrote  the  triple  that  asserted  that  chocolate  has  150  calories  per 
ounce.  If  that  fact  were  already  asserted  in  fooddb,  we  could  have  used  it  without 
explicitly  mentioning  it. 

Application  programs  can  query  RDF  descriptions  by  writing  their  own  code  or  by 
using  any  o  a  num  er  o  query  languages,  many  of  which  are  very  similar  to  the  ones 
that  are  used  to  query  relational  databases.  TTie  results  that  are  returned  in  response  to 

?-bC  *‘rhri'^,Sin  hp  "n,Set  h  •  eb  resources  l^at  match  the  query.  So,  for  example,  we’d 

I,kc  hi  wJSed| m  rfP°n!e  t0  a  quer>  that  asked  first  name  of  the  cre- 

alor  a  so  i  e  to  be  able  to  reason  in  less  trivial  ways  with  the  facts 
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that  have  been  provided.  For  example.  I  might  want  to  find  other  people  who  live  near 
me  and  who  like  high-calorie  food.  In  the  next  section  we'll  talk  about  the  issues  that 
arise  when  we  attempt  to  define  reasoning  engines  that  work  with  KDF. 

RDF  expressions  describe  properties  of  various  kinds  of  resources.  But  where  do 
those  RDF  expressions  reside  on  the  Web?  There  are  at  least  two  possible  answers. 
They  may  sit  in  separate  files  with  distinct  URls  that  name  them. Those  files  may  be 
private  or  they  may  be  publicly  available  and  searchable.  Another  possibility,  though, 
is  that  they  may  be  embedded  inside  the  objects  that  they  describe.  It  is  becoming  in¬ 
creasingly  common  for  Web  languages  to  provide  a  way  to  incorporate  metadata. such 
as  RDF  descriptions,  inside  objects  that  are  written  in  those  languages. 

Let’s  now  return  to  the  question  of  RDF's  vocabulary.  As  we  said  in  the  introduc¬ 
tion  to  our  discussion  of  RDF.  a  key  feature  of  the  language  is  that  there  is  not  one  uni¬ 
versal.  fixed  vocabulary.  There  are  some  standard  vocabularies  but  anyone  is  free  to 
design  a  new  one  (such  as  fooddb  or  the  one  that  describes  my  company).  Then  any 
number  of  vocabularies  may  be  combined  to  form  a  single  RDF  description. To  use  a 
vocabulary,  all  that  is  required  is  that  you  know  the  URI  for  its  definition. 

So  the  next  question  is.  “What  does  an  RDF  vocabulary  definition  look  like  and 
what  does  it  contain  ?"  The  answer  is  that  it  contains  descriptions  of  classes  of  objects 
and  the  properties  that  those  objects  may  have.  In  the  next  four  sections,  we  will  de¬ 
scribe  a  family  of  languages  in  which  such  descriptions  may  be  written.  An  important 
point  about  all  of  these  languages  is  that,  when  we  use  them,  we  don’t  just  write  a  list 
of  terms.  We  take  at  least  a  first  shot  at  defining  the  meanings  of  terms  by  relating 
them  to  other  terms. 

1.3.3  Defining  an  Ontology  and  an  Inference  Engine 

What  RDF  expressions  do  is  to  describe  objects  and  their  properties  (which  we 
called  predicates  when  wc  were  describing  triples).  Each  RDF  properly  corresponds 
to  a  relation  that  has  a  domain  and  a  range.  The  domain  is  the  set  of  things  to  which 
the  property  may  apply. The  range  is  the  set  of  values  that  the  property  may  have.  So. 
for  example,  we  might  want  to  define  the  property  org-name.  whose  domain  is  the 
set  of  organizations  and  whose  range  is  the  set  of  strings.  To  define  a  property,  we 
need  to  define  its  domain  and  its  range.  In  order  to  be  able  to  do  that,  we  must  also 
be  able  to  define  classes,  such  as  organizations.  And  we  need  to  make  sure  that  these 
definitions  are  constructed  in  a  way  that  makes  it  possible  to  reason  with  them.  For 
example,  we  must  be  able  to  answer  the  question. "Does  object  A.  with  properties  P, 
and  A  satisfy  the  description  provided  by  query  Q.  which  specifies  properties  P,  and 
Pj?”  Note  that  it’s  not  sufficient  to  do  the  trivial  check  of  determining  w  hether  P,  is 
identical  to  P,  and  P2  is  identical  to  P4.  For  example.  P,  might  be  the  property  of 
being  a  golden  retriever  and  P4  might  be  the  property  of  being  a  dog.  Clearly  P,  sat¬ 
isfies  P3.  even  though  they  are  not  the  same.  So  we  must  construct  definitions  that 
support  the  kinds  of  inference  that  we  want  to  be  able  to  do. 

At  this  point, anyone  who  is  familiar  with  object-oriented  programming,  databases 
or  artificial  intelligence  (AI)  ohserves.  “This  is  familiar  territory."  Object-oriented 
programming  languages,  database  schema  languages  and  Al  knowledge  representa¬ 
tion  languages  are  all  designed  to  allow  the  definition  of  classes  and  their  properties 
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In  object-oriented  programming,  the  most  important  properties  are  methods,  i.e.,  proce¬ 
dures  that  can  be  applied  to  members  of  a  class.  In  the  case  of  database  schema  lan¬ 
guages  und  A l  knowledge  representation  languages,  declarative  properties  play  a  more 
important  role. The  job  of  these  properties  is  to  permit  inference  about  both  classes  and 
their  elements. The  design  of  a  family  of  representation  languages  that  can  be  used  to  de¬ 
fine  the  meaning  of  RDF  expressions  is  based  on  a  long  tradition  of  work  in  the  tradition 
of  declarative  representation  languages. 

But  the  World  Wide  Web  environment  creates  some  new  challenges.  For  example, 
inference  engines  in  many  database  environments  can  reasonably  make  what  is  called 
the  closed  world  assumption  They  assume  that  all  the  relevant  objects  and  their  prop¬ 
erties  are  present  in  the  database.  If  an  object  or  a  properly  is  missing  from  the  data¬ 
base.  it  is  presumed  not  to  exist  in  the  world.  For  example,  a  program  that  is  querying 
an  airline  database  can  assume  that,  if  a  flight  is  not  listed,  it  doesn't  exist.  In  the  dis¬ 
tributed  knowledge  environment  of  the  World  Wide  Web.  however,  the  closed  world 
assumption  is  rarely  justified.  For  example,  if  1  query  the  Web  with  my  uncle's  name 
but  fail  to  find  him  mentioned,  it  doesn't  mean  that  he  doesn’t  exist.  We’ll  say  that,  in 
the  Web  environment,  we  must  often  make  the  open  world  assumption. 

Another  important  difference  between  most  database  and  artificial  intelligence  sys¬ 
tems  on  the  one  hand,  and  the  World  Wide  Web  environment  on  the  other,  is  that  no 
one  individual  or  organization  has  sole  responsibility  for  defining  the  system.  We  will 
consider  some  of  the  important  implications  of  this  difference  below. 

An  ontology ,  as  used  in  the  context  of  an  automated  reasoning  system,  is  a  formal 
specification  of  the  objects  in  a  domain  and  their  relationships  to  each  other.  An  ontol¬ 
ogy  consists  of  a  set  of  classes,  typically  arranged  in  a  subclass/superclass  hierarchy.  It 
may  describe  specific  individuals  that  are  members  of  those  classes.  It  typically  describes 
properties,  both  of  classes  and  of  individuals.  Those  properties  may  include  quite  general 
ones,  like  part-of.  as  well  as  more  specific  ones  like  calori  esperounce.The  properties 
themselves  typically  have  properties,  including,  at  a  minimum,  their  domain  and  range.  So 
an  RDF  vocabulary  is  an  ontology. 

Despite  the  rich  corpus  of  work  that  has  been  done  in  the  broad  area  of  knowledge- 
based  systems,  the  architects  of  the  Semantic  Web  faced  some  hard  choices  as  they  set 
about  Ihe  task  of  building  an  ontology  language  that  could  support  both  defining  the 
Web's  objects  and  reasoning  with  them.  The  perfect  language  would  be  expressively 
powerlul.  defined  by  u  clear  lormal  semantics,  decidable, computationally  tractable,  and 
easy  for  people  to  use.  Unfortunately,  the  key  results  that  we  presented  in  Theorem  22.4 
and  Theorem  28. 1 6  again  rear  their  ugly  heads: 


•  First-order  logic  is  attractive  because  it  has  a  clear,  formal  semantics.  It  is  expressive¬ 
ly  powerlul  enough  for  many  tasks.  While  it  isn't  expressively  perfect,  other,  more 
powerlul  logics  aie  even  less  computationally  tractable  than  first-order  logic  is. 

•  But  first-order  logic  is  not,  in  general,  decidable  (Theorem  22.4)  and  it  is  possible  to 
define  theories  that  are  incomplete  (as  shown  by  Godel). 

•  And  even  Boolean  logic  appears,  in  general,  to  be  intractable  (Theorem  28.16). 


So  compromises  are  required.  There  isn  t  one  set  of  compromises  that  is  perfect  for 
U|1  tasks.  Vet  standards  are  crucial  if  the  potential  of  the  Web  as  a  shared  resource  is 
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lining  to  be  exploited.  Three  decisions  have  made  it  possible  for  the  task  of  defining  a 
set  of  representation  standards  &  for  the  Semantic  Web  to  proceed: 

•  There  won't  be  a  single  ontology.  For  all  the  reasons  that  we  have  already  men¬ 
tioned.  individual  users  and  user  communities  will  be  free  to  define  ontologies  that 
suit  their  needs.  To  make  this  possible,  though,  there  must  be  a  standard  ontology- 
definition  languagc(s)  that  those  users  can  exploit. 

•  Languages  will  be  defined  in  layers.  Rather  than  waiting  until  all  of  the  issues  and 
tradeoffs  can  be  resolved,  standards  for  parts  of  the  problem  will  be  released  as 
they  are  agreed  upon.  We've  already  described  the  lowest  level,  the  URl  language, 
and  the  second  level.  RDF.  After  a  brief  introduction  to  description  logics,  we  will 
sketch  the  next  two  layers,  RDFS  and  OWL. 

•  At  the  highest  level(s).  there  is  simply  no  one  right  answer  for  all  problems.  So  alter¬ 
native,  but  upward  compatible  languages  will  be  provided.  Users  who  slick  to  the 
less  expressive  subsets  can  be  assured  decidability  and.  in  some  cases,  tractability. 
Users  who  choose  to  use  the  more  expressive  languages  will  be  responsible  for  find¬ 
ing  domain-appropriate  compromises  that  deal  with  the  decidability  and  tractability 
issues  that  those  languages  present. 


1.3.4  Description  Logics 

The  development  path  that  we  are  about  to  describe,  from  RDFS  to  OWL.  is  based  on 
a  knowledge  representation  framework  called  description  logic  (or  DL)l\  Most  DL 
languages  are  sublanguages  of  first-order  logic,  tailored  to  the  task  of  defining  and 
reasoning  with  classes,  instances,  and  their  properties.  The  most  important  reasoning 
operations  are  typically: 

•  subsumption  cheeking:  Given  the  definitions  of  two  classes,  does  one  subsume  the 
other?  (('lass  A  subsumes  class  H  iff  every  element  of  II  is  an  element  of  A.  Stated 
another  way.  A  subsumes  II  iff  A  is  at  least  as  general  as  II.) 

•  classification:  Given  a  set  or  classes,  arrange  them,  based  on  their  descriptions,  into 
a  subsumption  graph.  Think  of  a  subsumption  graph  as  a  subciuss/superclass  hierar¬ 
chy.  We  showed  one  instance  of  such  a  graph  in  Example  A. 7. 

•  realization:  Given  an  ontology  and  a  description  of  some  particular  entity  in 
terms  of  a  set  of  properties,  find  the  classes  to  which  the  entity  belongs.  In  other 
words,  find  the  classes  whose  definitions  are  consistent  with  the  definition  of  the 
entity. 

•  consistency/satisfiability  checking:  Given  a  set  of  two  or  more  descriptions,  are  they 
consistent?  Alternatively,  could  there  exist  a  nonempty  set  of  objects  that  satisfies 
all  of  those  descriptions?  Note  that  inconsistency  checking  is  a  special  case  of  sub¬ 
sumption  checking  in  which  the  proposed  suhsumer  is  the  empty  set.  Any  set  that  is 
suhsumed  by  the  empty  set  is  also  necessarily  empty. 

‘Tor  a  comprehensive  treatment  ol  description  logics. see  (Hander. f.  at.  2<KI3|.  particular!}  the  first  chunter: 
|Nardi  and  Hrachnun  2<M3|. 
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The  details  of  the  definition  of  a  DL  language  matter.  Depending  on  how  classes, 
properties,  and  instances  are  allowed  to  be  described,  a  DL  logic  may: 

•  share  the  problems  of  full  first-order  logic.  In  other  words,  it  may  be  undecidable 
and  it  may  allow  the  definition  of  theories  that  are  incomplete. 

•  be  decidable  but  apparently  intractable. 

•  be  decidable  in  polynomial  lime. 

So  what  is  the  right  DL  language  for  the  Semantic  Web?  There  is  no  one  right  an¬ 
swer.  The  tradeoff  is  between  expressive  power  and  the  desirable  computational  prop¬ 
erties  of  decidability  and  tractability.Then  what  should  a  standards  committee  do?  We 
won't  attempt  to  answer  that  question.  But  we  will  describe  what  the  Semantic  Web 
committees  did  do: 

•  ’lhev  worked  for  several  years  without  agreeing  on  a  single  common  language.  But 
they  needed  to  release  a  standard  in  order  to  allow  work  on  defining  useful  ontolo¬ 
gies  to  proceed. 

•  So  they  released  the  definition  of  RDFS  (a  Resource  Description  Framework 
Schema  language).  Oversimplifying  a  complex  issue,  RDFS  is  merely  a  subset  of 
the  language  that  everyone  knew  would  eventually  be  required.  But  it  is  a  subset 
that  people  could  agree  on  after  only  a  few  years  of  discussion. 

•  Meanwhile,  work  continued  and  a  sequence  of  other  languages  were  defined  and 
standards  for  them  were  released.  The  ideas  from  some  of  the  earlier  such  lan¬ 
guages  evolved  into  the  OWL  family  of  languages.  OWL  has  three  dialects  that 
range  from  expressively  weak  and  computationally  easy  to  expressively  powerful 
and  computationally  undecidable.  Users  are  free  to  choose  to  use  exactly  the  lan¬ 
guage  features  that  they  need  and  no  more. 

In  the  next  two  sections,  we'll  sketch  the  definitions  of  RDFS  and  the  three  OWL 
dialects,  showing  how  the  issues  of  computability  and  tractability  influenced  those 
definitions. 


5  RDFS:  A  Resource  Description  Framework  Schema  Language 

The  Resource  Description  Framework  Schema  language,  RDF  Schema,  or  simply 
RDhS.  permits  the  definition  of  RDF  vocabularies,  i.e.,  classes,  instances,  and  proper¬ 
ties  that  correspond  to  relations  among  them.  RDFS  programs  are  written  in  RDF  syn¬ 
tax.  (In  other  words,  they  consist  of  sets  of  triples.)  They  may  exploit  the  constructs  that 
are  defined  in  the  RDF  namespace  (rdf),  as  well  as  the  concepts  that  are  defined  in 
the  RDl'S  namespace  (rdf  s).  They  may  also,  of  course,  exploit  constructs  that  are  de¬ 
fined  in  any  other  namespace  for  which  a  URI  is  known. 

The  mechanism  by  which  classes  and  properties  are  defined  in  RDFS  differs  in  one 
important  way  from  the  mechanism  by  which  they  are  defined  in  most  object-oriented 
systems,  including  most  database  schema  languages.  In  those  systems,  the  focus  is  on 
classes.  Someone  defines  a  class  by  listing  its  properties.  In  RDFS,  on  the  other  hand. 
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the  focus  is  on  properties.  A  user  defines  a  properly  by  specifying  (using  the  domain 
and  range  properties  defined  below)  the  classes  to  which  it  applies.  The  advantage 
of  this  property-oriented  approach  in  the  distributed  environment  of  the  World 
Wide  Web  is  that  classes  are  not  “owned"  by  particular  users.  One  user  may  declare 
a  class  and  then  define  some  properties  whose  domain  and/or  range  is  equal  to  that 
class.  Another  user,  working  on  a  different  problem,  may  define  new  properties  that 
apply  to  that  same  class.  So.  for  example,  one  user  might  define  the  property 
caloriesperounce  with  domain  fooddb:food  and  range  number.  Someone  else,  tak¬ 
ing  a  completely  different  point  of  view,  might  define  the  property  wheni  nseason,  also 
with  the  domain  fooddb:food  but  this  time  with  range  timeofyear.  While  one  user 
owns  the  URI  for  fooddb  .food,  neither  user  "owns"  the  definition  of  the  class  food. 
Both  of  them  may  use  each  other's  properties  if  they  want  to  and  if  they  know  the 
URIs  of  the  resources  in  which  the  properties  are  defined. 

RDF  and  RDFS.  between  them,  define  some  fundamental  classes: 

•  rdfs :  class:The  class  of  all  classes. The  members  of  a  class  are  called  its  instances. 
Every  class  is  an  instance  of  rdfs:  cl  ass.  (So,  in  particular,  rdfs:  cl  ass  is  an  in¬ 
stance  of  itself.) 

•  rdfs  resource:  The  class  that  contains  everything  that  an  RDF  program  can  talk 
about.  Every  other  class  is  both  an  instance  and  a  subclass  of  rdfs: resource, 
which,  in  turn,  is  an  instance  of  rdfs:  cl  ass.  In  other  ontologies,  this  most  general 
class  is  typically  called  something  like  "thing". 

•  rdf :  property: The  class  of  properties  that  can  be  used  to  define  classes,  instances, 
and  other  properties. 

•  rdfs:  literal:  The  class  of  literal  values  such  as  strings  and  numbers. 

RDFS  distinguishes  between  a  class  and  the  set  of  its  instances.  So.  for  example,  the 
class  that  contains  all  cats  that  reside  at  the  While  House  may  be  different  from  the 
class  that  contains  all  cats  owned  by  the  President,  even  if  those  two  classes  happen  to 
contain  the  same  instances. 

RDF  and  RDFS  provide  some  built-in  properties  (i.c„  instances  of  the  class  rdf: 
property)  that  can  be  used  in  class  and  instance  definitions.  These  include: 

•  rdfs :  subCl  assOf:  Relates  two  classes.  If  A  is  a  subclass  of  B.  then  every  instance 
of  A  is  also  an  instance  of  fl.The  subClassOf  property  (relation)  is  transitive,  so  if 
A  is  a  subclass  of  B,  and  B  is  a  subclass  of  C.  then  every  instance  of  A  is  also  and  in¬ 
stance  of  C. 

•  rdf :  type:  Relates  an  instance  and  a  class.  If  A  rdf :  type  B.  then  A  is  an  instance 
of  B.  In  other  ontologies,  this  property  is  often  called  instance-of. 

RDFS  (unlike  RDF)  allows  users  to  define  new  properties  by.  in  turn,  defining  their 
properties.  Built-in  properties  that  can  be  used  to  define  other  properties  include- 

•  rdfs  :subPropertyOf.  So.  for  example,  col  or  might  be  declared  to  be  a  subproperly 
of  physical  characteristic.  The  subPropertyOf  property  (relation)  is  transitive. 
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•  rdf s :  domai  n:  If  P  rdf  s :  domai  n  C,  then  the  domain  of  the  relation  (property)  P  is 
the  class  C.  So  the  only  resources  that  can  possess  property  P  are  instances  of  the 
class  C.  In  terms  of  triples,  this  means  that  the  subject  of  any  triple  whose  predicate 
is  P  must  he  an  instance  of  C. 

•  rdf  s :  range:  If  P  rdfs :  range  C,  then  the  range  of  the  relation  (property)  P  is  the  class 
C.  So  all  values  of  the  properly  P  are  instances  of  the  class  C.  Again,  in  terms  of  triples, 
this  means  that  the  object  of  any  triple  whose  predicate  is  P  must  be  an  instance  of  C. 

RDFS  also  provides  mechanisms  for  defining: 

•  Containers:  open  ended  structures  that  contain  other  resources.  Containers  may  be 
viewed  as  (unordered)  bags  (i.e.,  sets  with  duplicates  allowed),  as  sequences  (where 
order  matters),  or  in  some  other  way. 

•  Collections:  closed  structures  that  contain  other  resources.  A  collection  is  described 
as  a  list  of  its  members. 

An  RDFS  program,  just  like  an  RDF  program,  is  a  list  of  triples.  The  meaning  of 
such  a  program  is  to  assert  the  conjunction  of  the  assertions  made  by  each  of  the  indi¬ 
vidual  triples.  So.  viewed  as  a  logical  language,  RDFS  is  first-order  logic  with  some  very 
important  restrictions,  including: 

•  Only  binary  predicates  can  be  represented  as  triples.  Properties  that  might  natural¬ 
ly  be  stated  in  first -order  logic  using  unary  predicates  are  typically  described  in 
RDFS  using  the  rdf :  type  property.  So,  for  example,  instead  of  saying  food 
(chocolate),  in  RDFS  we  say  chocolate  rdf : type  food. 

■  The  only  logical  connector  that  is  allowed  is  conjunction.  (And  it  is  not  written  ex¬ 
plicitly.  Rather  all  the  triples  are  interpreted  to  be  conjoined.)  In  particular,  nega¬ 
tion.  disjunction,  and  implication  cannot  be  represented. 

•  There  are  no  explicitly  quantified  variables. 

So  RDFS,  while  useful,  is  inadequate  for  many  practical  representation  and  reason¬ 
ing  tasks.  To  support  those  tasks,  additional  mechanisms  must  be  provided.  The  OWL 
family  of  languages,  to  be  described  next,  provides  some  of  those  mechanisms. 

On  the  other  hand,  RDFS,  as  a  logical  language,  allows  the  specification  of  theories 
that  are  incomplete  and  undecidable.  One  culprit  is  the  fact  that  a  class  may  also  be 
treated  as  an  individual.  So  a  set  (a  class)  may  be  both  a  subset  of  and  an  element  of  an¬ 
other  set.  Without  a  distinction  between  sets  and  elements,  definitions,  such  as  this  one, 
known  as  Russell's  paradox,  are  possible; 

Let  S  be  the  set  ol  all  sets  that  are  not  members  of  themselves.  Is  S  an  element  of 
5?  The  answer  to  this  question  cannot  be  yes,  since  if  5  is  an  element  of  S,  it  fails 
to  meet  the  requirement  for  membership  in  S.  But  the  answer  also  cannot  be  no. 
since  if  .V  is  not  an  clement  of  S  then  it  does  meet  S's  requirement  for  member¬ 
ship  and  it  must  therefore  be  in  S. 

Ihe  design  of  the  OWL  family  of  languages  also  addressed  the  desire  to  eliminate 
this  problem. 
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13.6  OWL 

OWL.  like  RDFS.  is  a  language  for  publishing  and  sharing  ontologies  on  the  World 
Wide  Web.  OWL  is  designed  to  support  both  the  construction  of  ontologies  and  the  im¬ 
plementation  of  reasoning  engines  (theorem  provers)  that  can  reason  with  the  knowl¬ 
edge  that  its  ontologies  contain. 

Building  an  OWL  ontology  for  a  particular  task  may  be  substantially  easier  than 
building  that  ontology  in  a  more  traditional  environment  because  it  is  rarely  necessary 
to  start  from  scratch.  Many  OWL  ontologies  Q  already  exist  on  the  Web  and  new  on¬ 
tologies  can  be  built  simply  by  adding  to  the  existing  ones. 

An  OWL  ontology,  just  like  an  RDFS  one.  is  simply  a  set  of  RDF  triples.  What  OWL 
offers,  that  RDFS  doesn't,  is  primarily: 

•  the  ability  to  express  more  complex  relationships  among  classes,  and 

•  the  ability  to  specify  more  precise  constraints  on  classes  and  their  properties. 

So.  for  example,  in  OWL,  one  can: 

•  describe  constraints  on  the  number  of  values  that  a  property  may  possess  for  ob¬ 
jects  of  a  particular  class.  For  example,  a  person  can  have  only  one  mother  but  any 
number  of  children.  In  OWL.  a  property  is  "functional”  iff  each  subject  (i.e.,  an  ele¬ 
ment  of  the  property's  domain)  has  at  most  one  value  for  the  property.  A  property 
is  "inverse-functional”  iff  each  object  (i.e.,  an  element  of  the  property's  range)  may 
be  the  value  of  the  property  for  no  more  than  one  subject.  So.  for  example, 
USsocialsecuritynumber  is  a  functional  property  of  people  (since  each  person 
has  only  one).  It  is  also  inverse-functional  since  each  number  is  the  social  security 
number  of  at  most  one  person. 

•  describe  constraints  on  the  values  that  a  property  may  possess  for  objects  of  a  par¬ 
ticular  class.  (Note  that  this  is  different  from  specifying  the  range  of  the  property 
since  the  range  applies  to  the  property  regardless  of  the  class  of  the  individual  to 
whom  the  property  applies.  So.  for  example,  the  value  of  the  mother  property  for  a 
person  must  be  a  person,  while  the  value  of  the  mother  property  of  a  cat  must  be  a 
cal.  and  so  forth.) 

•  write  statements  that  enable  the  system  to  infer  that  any  object  that  possesses  a 
given  set  of  property  values  is  necessarily  a  member  of  some  class. 

•  define  new  classes  in  terms  of  existing  classes  by  using  the  operations  union,  inter¬ 
section,  and  complement. 

•  assert  that  two  classes  are  necessarily  disjoint. 

•  define  a  class  by  enumerating  its  elements. 

•  declare  that  a  property  is  transitive,  symmetric,  one-to-one.  one-to-many,  or  many- 
to-one. 

•  assert  equality  and  inequality. 

In  providing  these  abilities.  OWL  must  give  all  its  users  the  power  they  need  for  the 
applications  they  are  building  while  making,  as  strongly  as  possible,  guarantees  of  com¬ 
pleteness,  decidability  and  tractahility. 
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Since  expressive  power  trades  off  against  completeness,  decidability  and  tractabil- 
ity.  there  is  no  single  language  that  can  meet  all  of  these  goals  for  all  users  (and  their 
applications).  So  OWL  is  not  a  single  language  but  a  family  of  upward  compatible 
languages: 

•  OWL  Full  is  the  most  expressively  powerful  of  the  OWL  dialects.  But  the  expres¬ 
siveness  comes  at  a  price.  A  particular  theory  that  is  expressed  in  OWL  Full  may  be 
consistent,  complete  and  decidable,  but  there  is  no  guarantee  that  it  is.  In  this  sense, 
OWL  Full  is  analogous  to  first-order  logic: There  are  first-order  logic  theories,  such 
us  Presburger  arithmetic,  a  theory  of  the  natural  numbers  with  plus  as  the  only  op¬ 
erator,  that  are  complete  and  decidable.  But  not  every  first-order  logic  theory  is.  U 
is  known  that  no  decision  procedure  for  OWL  Full  can  exist. 

•  OWL  DL  (where  the  DL  stands  for  Description  Logic)  is  a  compromise  between 
expressiveness  and  desirable  computational  properties.  More  specifically,  OWL  DL 
is  intended  to  be  a  maximal  subset  of  OWL  Full  for  whieh  not  only  a  complete  in¬ 
ference  procedure  but  also  a  decision  procedure  is  known  to  exist.  OWL  DL  sup¬ 
ports  all  of  the  language  constructs  of  OWL  Full,  but  it  imposes  some  constraints 
on  how  they  are  used, 

•  OWL  Lite  is  expressively  the  weakest  dialect  of  OWL.  As  a  result,  its  inference  pro¬ 
cedure  guarantees  better  worst-case  performance  than  do  the  corresponding  pro¬ 
cedures  of  its  more  expressive  cousins.  It  is  designed  to  support  the  definition  of  a 
straightforward  ontology  based  on  a  hierarchy  of  classes  and  subclasses. 


OWL  Full  is  a  superset  of  RDFS.  OWL  Lite  and  OWL  DL  are  both  supersets  of  a 
subset  of  RDFS.  In  particular,  while  RDFS  allows  an  object  to  be  both  a  class  and  an 
instance,  neither  OWL  Lite  nor  OWL  DL  does  (and  so  the  Russell  paradox  would  not 
be  expressible  in  them). 

OWL  DL  achieves  its  completeness  and  decidability  properties  bv  imposing  con¬ 
straints,  including  the  following,  on  the  way  that  the  OWL  vocabulary  may  be  used. 

•  Type  separation  is  enforced  (as  mentioned  above).  A  class  cannot  also  be  an  instance 
or  a  properly.  A  property  cannot  also  be  an  instance  or  a  class. 


•  A  properly  must  either  be  an  objectProperty  (i.e.,  a  relation  between  instances  of 
two  classes)  or  a  data  type  property  (i.e.,  a  relation  between  an  instance  of  a  class 
and  an  RDF  literal  or  a  built-in  XML  Schema  data  type).  It  may  not  be  both. 

•  No  cardinality  constraints  can  be  placed  on  transitive  properties  or  their  inverses  or 
any  of  their  superproperties. 


•  Statements  about  equality  and  inequality  can  only  be  made  about  named  individuals. 

The  fact  that  OWL  DL  is  decidable  (and  thus  consistency  checking  is  possible) 
makes  it  a  useful  tool  in  domains  where  consistency  is  critical,  in  the  next  section  we'll 
mention  one  system,  Galen,  that  lakes  advantage  of  this  ability. 

OWL  Lite  uses  a  subset  of  the  OWL  vocabulary.  It  imposes  all  the  same  constraints 
on  voca  ">u  ary  use  t  tat  OWL  DL  does.  And  it  imposes  additional  constraints,  including: 

*  Thli  i°e'  fV‘llucs  are  a*'owe‘l  as  cardinality  constraints  on  properties  are  0 
and  L  So.  for  cxamplc,  u  is  not  possible  to  say  that  the  number  of  members  of  a  soc¬ 
cer  team  must  he  at  leas!  1 1 . 
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•  Classes  cannot  be  defined  using  the  union  or  complement  operator  (applied  to 
other  classes).  So.  lor  example,  it  is  not  possible  to  define  a  class  commonpet  as  the 
union  of  the  dog  and  cat  classes. 

•  Only  explicitly  named  classes  and  properties  may  be  used  with  the  intersection  op¬ 
erator  to  form  new  classes  or  properties. 

•  Classes  cannot  be  defined  by  enumerating  their  elements.  So.  for  example,  it  is  not 
possible  to  define  the  class  dayofweek  by  listing  the  seven  days  of  the  week. 

•  Classes  cannot  be  asserted  to  be  the  same  or  to  be  disjoint. 

•  It  is  not  possible  to  assert  that  any  instance  of  a  given  class  must  have  some  partic¬ 
ular  value  for  a  given  property.  So.  for  example,  it  is  not  possible  to  require  that 
every  element  of  the  class  UScitizen  must  have  the  value  United  States  for  the 
ci  ti  zenof  property. 


1.3.7  Exploiting  The  Semantic  Web 

Metadata  resources  on  the  Web  are  growing  daily  °.  Many  are  based  on  RDF,  RDF 
Schema,  and  OWL.  as  we  have  described  them  here.  Many  arc  based  on  other  representa¬ 
tional  systems.  Some  use  other  languages  derived  from  description  logic;  others  use  lan¬ 
guages  that  are  more  similar  to  relational  databases.  At  the  core  of  many  of  these  efforts 
is  the  development  of  common  ontologies  that  enable  entire  communities  of  users  to 
share  Web  resources.  We've  already  mentioned  a  few  of  these  ontologies,  for  example  the 
Friend  of  a  Friend  (foaf)  and  Dublin  Core  vocabularies.  We’ll  mention  a  few  more  here. 

The  need  to  share  data  is  acute  and  well-understood  within  the  biomedical  research 
community.  Several  ontology-construction  projects  have  been  driven  by  the  need  to 
make  this  possible.  For  example: 

•  The  objective  of  the  Gene  Ontology  (GO)  Consortium  D  is  to  address  the  need  for 
consistent  descriptions,  in  a  species-independent  way.  of  gene  products  in  different 
databases.  To  meet  this  need,  the  GO  project  has  developed  three  ontologies,  each 
of  which  enables  the  description  of  gene  products  from  one  important  perspective: 
associated  biological  processes,  cellular  components,  and  molecular  functions.  GO 
exploits  both  XML  and  RDF/XML. 

•  The  objective  of  the  GALEN  u  project  is  make  it  easier  to  build  useful  clinical 
(medical)  applications  by  providing  a  common  knowledge  base  of  clinical  terminol¬ 
ogy.  The  core  of  this  knowledge  base  is  an  ontology  that  is  intended  to  contain  all 
and  only  the  sensible  medical  concepts.  Dictionaries  (in  many  languages)  can  then 
connect  words  to  concepts.  The  first  implementation  of  GALEN  was  built  using  a 
representation  language  that  was  designed  explicitly  to  support  the  GALEN  proj- 
ect.The  GALEN  ontology  has  since  been  translated  into  OWL.  More  specifically,  it 
exploits  OWL  DL.  Because  OWL  DL  is  decidable,  it  is  possible  (although  it  may 
require  running  overnight)  to  answer  the  question, “Is  this  version  of  the  GALEN 
ontology  consistent?” 

Consider  the  problem  of  adding  location  information  to  various  kinds  of  Web  re¬ 
sources.  For  example,  a  user  might  like  to  find  only  those  blogs  that  originate  in  some 
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particular  location.  There  exists  an  RDF  vocabulary  B  for  describing  basic  geograph¬ 
ical  information  in  a  way  that  would  make  that  possible. 

Consider  the  wide  range  of  physical  devices  that  are  used  to  access  the  World  Wide 
Web. They  range  from  computers  with  large  display  screens  and  full  sound  capability  to 
handheld  devices  that  are  used  in  settings  where  sound  is  not  appropriate  and  band¬ 
width  may  be  limited.  Further,  the  users  of  those  devices  have  their  own  preferences 
for  things  ranging  from  language  to  font  size.  One  way  to  make  it  possible  to  customize 
information  delivery  for  all  of  those  situations  is  for  clients  to  describe,  for  example  in 
RDF,  their  characteristics  and  preferences  B. 

Thesauri  categorize  words  using  relationships  similar  to  the  ones  used  in  many  use¬ 
ful  kinds  of  ontologies.  So  it  may  be  natural  to  represent  thesaurus  information  in  an 
ontology  language  such  as  RDF  or  OWL.  Wordnet  H  is  a  large,  online  lexical  refer¬ 
ence  system  that  organizes  English  nouns,  verbs,  adjectives  and  adverbs  into  synonym 
sets  that  correspond  to  underlying  lexical  concepts. Those  lexical  concepts  are,  in  turn, 
organized  hy  relations  such  as  hypernym  (more  general  concept),  hyponym  (more  spe¬ 
cific  concept),  and  hasinstance. The  Wordnet  lexicon  has  been  encoded  in  RDF/OWL 
to  make  it  accessible  to  a  wide  assortment  of  other  metadata  projects. 


appendix 
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Applications:  Security 


In  the  modern  world,  security  is  an  important  feature  of  almost  every  system  we 
use.  We  protect  physical  locations  with  locks  and  burglar  alarms.  We  protect  com¬ 
puters  with  safe  operating  systems  and  sophisticated  virus  checkers.  We  protect 
sensitive  communications  by  encrypting  them.  The  theory  described  in  this  book  has 
something  to  say  about  all  of  those  techniques. 


J.1  Physical  Security  Systems  as  FSMs 

Imagine  a  conventional  intrusion-detection  security  system  of  the  sort  that  is  found  in 
all  kinds  of  buildings,  including  houses,  offices,  and  banks.  Such  systems  can  naturally 
be  modeled  as  finite-slate  machines. 

Some  intrusion-detection  systems  arc  complex:  They  may.  for  example,  divide  the 
region  that  is  being  protected  into  multiple  zones.  Then  the  state  of  each  zone  may  be 
partially  or  completely  independent  of  the  stales  of  the  other  zones.  But  we  can  easily 
see  the  essential  structure  of  such  systems  by  considering  the  simple  DFSM  shown  in 
Figure  J.l.The  inputs  to  this  FSM  arc  user  commands  and  timing  events:  arm  (turn  on 


FIGURE  J.l  A  simple  physical  security  system. 
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FIGURE  J.2  The  code-entering  fragment  of  a  security  system. 


the  system),  disarm  (turn  off  the  system),  query  the  status  of  the  system,  reset  the 
system,  open  a  door,  activate  the  glass-break  detector,  and  30  seconds  elapse. The  job 
of  this  machine  is  to  detect  an  intrusion.  So  we  have  labeled  the  states  that  require  an 
alarm  as  accepting  states.  State  6  differs  from  slate  1  since  it  displays  that  an  alarm 
has  occurred  since  the  last  time  the  system  was  reset  and  it  will  not  allow  the  system 
to  be  armed  until  a  reset  occurs. 

A  realistic  system  has  many  more  states.  For  example,  suppose  that  alarm  codes 
consist  of  four  digits.  Then  the  single  transition  from  state  1  to  state  2  is  actually  a  se¬ 
quence  of  four  transitions,  one  for  each  digit  that  must  be  typed  in  order  to  arm  the  sys¬ 
tem.  Suppose,  for  example,  that  the  alarm  code  is  9999.  Then  we  can  describe  the 
code-entering  fragment  of  the  system  as  the  DFSM  shown  in  Figure  J.2. 

Note  that  we  have  not  specified  what  happens  if  the  query  button  is  pushed  in  states 
A-C.  One  of  the  questions  that  that  system  designer  must  answer  is  whether  the  query 
function  is  allowed  in  the  middle  of  an  arming  sequence. 


Computer  System  Safety 

Consider  a  complex  computer  system.  It  includes  files  that  some  users,  but  not  others, 
have  access  to.  It  includes  processes  (like  print  pay  checks)  that  some  users  are  allowed 
lo  run  but  most  are  not.  Is  it  decidable  whether  such  a  system  is  safe?  For  example  is  it 
decidable,  given  the  operations  that  are  possible  in  the  system,  whether  an  unautho¬ 
rized  user  could  acquire  access  to  the  paycheck  printing  system?  The  answer  to  this 
question  depends,  of  course,  on  the  operations  that  are  allowed. 

To  build  a  model  of  the  protection  status  of  a  system,  we’U  use  three  kinds  of  entities: 

•  Subjects:  Active  agents,  generally  processes  or  users. 

•  Objects.  Resources  that  the  agents  need  to  exploit.These  could  include  files,  processes, 
devices,  etc.  Notice  that  processes  can  be  viewed  both  as  subjects  (entities  capable  of 
doing  things)  and  as  objects  (entities  that  other  entities  may  want  to  invoke). 

.  Rights:  Capabilities  that  agents  may  have  with  respect  to  the  objects.  Rights  could 
include  read  access,  write  access,  delete  access  or  execute  access  for  files,  execute 
access  for  processes,  edtt.  compile,  or  execute  access  for  source  code,  check  or 
change  access  for  n  password  file,  and  so  forth. 
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FIGURE  J.3  A  simple  access  control  matrix. 


We  can  describe  the  current  protection  status  of  a  system  with  an  access  control  ma¬ 
trix,  A,  that  contains  one  row  for  each  agent  and  one  column  for  each  protected  object. 
Each  cell  of  this  matrix  contains  the  set  of  rights  that  the  agent  possesses  with  respect 
to  the  object.  Figure  J.3  shows  a  simple  example  of  such  a  matrix. 

The  protection  status  of  a  system  must  be  able  to  evolve  along  with  the  system.  We’ll 
assume  the  existence  of  the  following  primitive  operations  for  changing  the  access  matrix: 

•  Create  subject  (x)  records  the  existence  of  a  new  subject  x.  such  as  a  new  user  or 
process. 

•  Create  object  (.c)  records  the  existence  of  a  new  object  x .  such  as  a  process  or  a  file. 

•  Destroy  subject  (x). 

•  Destroy  object  (.t). 

•  Enter  r  into  A[.r, «]  gives  subject  s  the  right  r  with  respect  to  object  o. 

•  Delete  r  from  A[.v,  o]  removes  subject  s's  right  r  with  respect  to  object  o. 

We  will  allow  commands  to  be  constructed  from  these  primitives,  but  all  such  com¬ 
mands  must  be  of  the  following  restricted  form: 

coinmand-name(X|,  x2 . x„)  = 

if  rj  in  A[....  ...]  and 
r2  in  A[. . j  and 

rf  in  A[ ....  ...j 
then 

operation! 

operation? 


operation,,, 

In  other  words,  the  command  may  check  that  particular  rights  are  present  in  select¬ 
ed  cells  of  the  access  matrix.  If  all  conditions  are  met.  then  the  operation  sequence  is 
executed.  All  the  operations  must  be  primitive  operations  as  defined  above.  So  no  ad¬ 
ditional  tests,  loops,  or  branches  are  allowed. The  parameters  .r,,.v, . x„  must  each 

be  bound  to  some  subject  or  some  object. The  rights  r|.  r2. . . .  r,  are  hard-coded  into  the 
definition  of  a  particular  command. 
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Define  a  protection  framework  to  be  a  set  of  commands  that  have  been  defined  as 
described  above  and  that  are  available  for  modifying  an  access  control  matrix.  Define 
a  protection  system  to  be  a  pair  (init,  framework).  Init  is  an  initial  configuration  that  is 
described  by  an  access  control  matrix  that  contains  various  rights  in  its  cells.  Framework 
is  a  protection  framework  that  describes  the  way  in  which  the  rights  contained  in  the 
matrix  can  evolve  as  a  result  of  system  events. 

In  designing  a  protection  framework,  our  goal  is  typically  to  guarantee  that  certain 
subjects  maintain  control  over  certain  rights  to  certain  objects.  We  will  say  that  a  right 
has  leaked  iff  it  is  added  to  some  access  control  matrix  cell  that  did  not  already  contain 
it.  We  will  say  that  a  protection  system  is  safe  with  respect  to  some  right  r  iff  there  is  no 
sequence  of  commands  that  could,  if  executed  from  the  system's  initial  configuration, 
cause  r  to  be  leaked.  We'll  say  that  a  system  is  unsafe  iff  it  is  not  safe.  Note  that  this  def¬ 
inition  of  safely  is  probably  too  strong  for  most  real  applications.  For  example,  if  a 
process  creates  a  file  it  will  generally  want  to  assign  itself  various  rights  to  that  file. That 
assignment  of  rights  should  not  constitute  leakage.  It  may  also  choose  to  allocate  some 
rights  to  other  processes.  What  it  wants  to  be  able  to  guarantee  is  that  no  further  trans¬ 
fer  of  unauthorized  rights  will  occur.  That  more  narrow  definition  of  leakage  can  be 
described  in  our  framework  in  a  couple  of  ways,  including  the  ability  to  ask  about  leak¬ 
age  from  an  arbitrary  point  in  the  computation  (e.g..  after  the  file  has  been  created  and 
assigned  initial  rights)  and  the  ability  to  exclude  some  subjects  (i.e.,  those  who  are 
“trusted")  from  the  matrix  when  leakage  is  evaluated.  For  simplicity,  wc  will  consider 
just  the  basic  model  here. 

•  Given  a  protection  system  S  =  (init,  framework)  and  a  right  r,  is  it  decidable  whether 
5  is  safe  with  respect  to  /•? 


It  turns  out  that  if  we  impose  an  additional  constraint  on  the  form  of  the  commands  in 
the  system  then  the  answer  is  yes.  Define  a  protection  framework  to  be  mono-operational 
iff  the  body  of  each  command  contains  a  single  primitive  operation.  The  safety  question 
lor  mono-operational  protection  systems  is  decidable.  But  such  systems  are  very  limited. 
For  example,  they  do  not  allow  the  definition  of  a  command  by  which  a  subject  creates  a 
file  and  then  gives  itself  some  set  of  rights  to  that  file. 

So  we  must  consider  the  question  of  decidability  of  the  more  general  safety  question. 
Given  an  arbitrary  protection  system  S  =  (init,  framework)  and  a  right  r,  is  it  decidable 
whether  S  is  sale  with  respect  to  r?  Now  the  answer  is  no,  which  we  can  prove  by  reduc¬ 
tion  horn  H,.  —  { <M>  :  I  uring  machine  M  halts  on  ej.The  proof  that  we  are  about  to 
present  w(us  originally  given  in  [Harrison,  Ruzzo,  and  Ullman  1976],  which  was  con¬ 
cerned  with  protection  and  security  in  the  specific  context  of  operating  systems.  It  is  also 
presented,  in  the  larger  context  of  overall  system  security,  in  [Bishop  20031. 

The  key  ideas  in  the  proof  are  the  following: 


•  luspossi  e  o  encode  the  configuration  of  an  arbitrary  Turing  machine  Mas  an  ac- 
cess  contro  matrix  we  II  call  A.  To  do  this  will  require  a  set  of  "rights"  as  follows: 
•  One  for  each  element  of  M’s  tape  alphabet. 


^LhhCoSlate  ^'^lcse  musl  be  chosen  so  that  there  is  no  overlap  in 

,,J)  K’  oncs  ’hal  curr«Pond  lo  tape  alphabet  symbols.  Let  q,  he  the 
right  that  corresponds  to  any  otM's  halting  states. 
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Wo  call  these  objects  “rights”  (in  quotes)  because,  although  we  will  treat  them 
like  rights  in  a  protection  system,  they  are  not  rights  in  the  standard  sense.  They  do 
not  represent  actions  that  an  agent  can  take.  They  are  simply  symbols  that  will  be 
manipulated  by  the  reduction. 

Each  square  of  M's  tape  that  is  either  nonblank  or  has  been  visited  by  M  will 
correspond  to  one  cell  in  the  matrix  /l. The  cell  that  corresponds  to  square  of  M's 
tape  will  contain  the  “right”  that  corresponds  to  the  current  symbol  on  square^  of 
the  tape.  In  addition,  the  matrix  will  encode  the  position  of  M's  read/wrile  head  and 
its  state.  It  will  do  that  by  containing,  in  the  cell  that  is  currently  under  the  read/write 
head,  the  “right”  that  corresponds  to  M’s  current  stale. 

•  It  is  possible  to  describe  the  transition  function  of  a  ’luring  machine  as  a  protection 
framework  (a  set  of  commands,  as  described  above,  for  manipulating  the  access 
control  matrix). 

•  So  the  question,  “Does  M  ever  enter  one  of  its  halting  stales  when  started  with  an 
empty  tape?”  can  be  reduced  to  the  question.  “If  A  starts  out  representing  M's  ini¬ 
tial  configuration,  does  a  symbol  corresponding  to  any  halting  state  ever  get  insert¬ 
ed  into  any  cell  of  AT'  In  other  words."Has  any  halting  state  symbol  leaked?" 

So.  if  we  could  decide  whether  an  arbitrary  protection  system  is  safe  with  respect  to 
an  arbitrary  right  r,  we  could  decide  H|;.  But  we  know,  from  Theorem  21.1,  that  He  is 
not  in  D. 

The  only  question  we  are  asking  about  M  is  whether  or  not  it  halls.  If  it  halts,  we 
don't  care  which  of  its  halting  stales  it  lands  in.  So  we  will  begin  by  modifying  M  so  that 
it  has  a  single  halting  state  qr.  The  modified  M  will  enter  qf  iff  the  original  M  would 
enter  any  of  its  halting  states.  Now  we  can  ask  the  specific  question.  “Does  q/- leak?" 

To  make  it  easier  to  represent  M's  configuration  as  an  access  control  matrix,  we  will 
assume  that  M  has  a  one-way  (to  the  right)  infinite  tape,  rather  than  our  standard,  two- 
way  infinite  tape.  By  Theorem  17.5,  any  compulation  by  a  Turing  machine  with  a  two- 
way  infinite  tape  can  be  simulated  by  a  Turing  machine  with  a  one-way  infinite  tape,  so 
this  assumption  does  not  limit  the  generality  of  the  result  that  we  are  about  to  present. 

To  see  how  a  configuration  of  M  is  encoded  as  an  access  control  matrix,  consider  the 
simple  example  shown  in  Figure  J.4  (a).  M  is  in  state  */<  and  we  assume  that  it  started  on 
the  blank  just  to  the  left  of  the  beginning  of  the  input,  so  there  are  four  nonblank  or  ex¬ 
amined  squares  on  M's  tape.  This  configuration  will  be  represented  as  the  square  ac¬ 
cess  control  matrix  A.  shown  in  Figure  J.4  (b).  A  contains  one  row  and  one  column  for 
each  tape  square  s  that  is  nonblank  or  has  been  visited: 

Notice  that,  primarily,  only  cells  along  A' s  major  diagonal  contain  any  rights.  The 
cell  /!(»'.  i]  contains  the  “right”  that  corresponds  to  the  contents  of  tape  square  /.Since 
the  read/wrile  head  is  on  square  3./\(3. 3)  also  contains  the  “right”  corresponding  to 
the  current  slate,  r/s.  The  only  other  “rights'*  encode  the  sequential  relationship 
among  ihe  squares  on  the  tape.  If  s,  immediately  precedes  sr  then  v,  “owns”  ,v;.  Finally 
the  cell  that  corresponds  to  the  right-most  nonblank  or  visited  tape  square  contains 
the  “right”  end. 

It  remains  to  show  how  the  operation  of  M  can  be  simulated  by  commands  that 
modify  A.  Given  a  particular  M,  we  can  construct  a  set  of  such  commands  that  exactly 
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FIGURE  J-4  Representing  a  Hiring  machine  configuration  as  an 
access  control  matrix. 


mimic  the  moves  that  M  can  make.  For  example,  suppose  that,  in  state  </5  reading  b,  M 
writes  an  a.  moves  left,  and  goes  to  state  q6.  We  construct  the  following  command: 


stateqs readi ngb (x2 ,  x2)  = 
if  own  in  A[xlt  x2]  and 

ds  in  A[x2,  x2]  and 

b  in  A[x2i  x2] 
then 

delete  qs  from  A[x2,  x2] 
delete  b  from  A[x2,  x2] 

enter  a  into  A[x2>  x2] 

enter  q6  into  A[xi,  xj 


/*  This  command  can  only  apply  to 
two  adjacent  tape  squares, 
where 

/*  the  one  to  the  right  is 

currently  under  the  read/write 
head  and  Mis  in  qs,  and 
/*  there  is  a  b  under  the 
read/write  head 

/*  Remove  the  old  state  info 
/*  and  the  current  symbol  under 
the  read/write  head. 

/*  wr-ite  the  new  symbol  under  the 
read/write  head. 

/*  Move  the  read/write  head  one 
square  to  the  left  and  go  to 
state  q6. 


We  must  construct  one  such  command  for  every  transition  of  M.  We  must  also  con¬ 
struct  commands  that  correspond  to  the  special  cases  in  which  M  tries  to  move  off  the 
tape  to  the  left  and  in  which  it  moves  to  the  right  to  a  previously  unvisited  blank  tape 
square.  The  latter  condition  occurs  whenever  M  tries  to  move  right  and  the  current 
tape  square  has  the  right  end.  In  that  case,  the  appropriate  command  must  first  cre¬ 
ate  a  new  object  and  a  new  subject  corresponding  to  the  next  tape  square. 

The  simulation  of  a  Turing  machine  M  begins  by  encoding  M’s  initial  configuration  as 
an  access  controlmatnx.  For  example,  suppose  that  M’s  initial  configuration  is  as  shown 

,n  5^Ure  J'5(a);The"VVe  *et  i4.,be  lhe  access  control  matrix  shown  in  Figure  J.5(b). 

CW  °  Cr  el*ai  S  We  musl  consider.  For  example,  since  we  are  going 
to  test  whether  9/eve r  gets  inserted  into,*  during  a  computation  we  must  be  sure  that 
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FIGURE  J.5  Encoding  an  initial  configuration 
as  an  access  control  matrix. 


q,  is  not  in  A  in  the  initial  configuration.  So  if  M  starts  in  qf,  we  will  first  modify  it  so 
that  it  starts  in  some  new  stale  and  then  makes  one  transition  to  qf. 

Notice  that  we  have  constructed  the  commands  in  such  a  fashion  that,  if  M  is  deter¬ 
ministic,  exactly  one  command  will  have  its  conditions  satisfied  at  any  point.  If  M  is  non- 
deterministic  then  more  than  one  command  may  match  against  some  configurations. 

We  can  now  show  that  it  is  undecidable  whether,  given  an  arbitrary  protection  sys¬ 
tem  S  =  (into  %  framework)  and  right  r.S  is  safe  with  respect  to  r.To  do  so,  we  define  the 
following  language  and  show  that  it  is  not  in  D: 

•  Safety  =  { <S.  r>  :  S  is  safe  with  respect  to  r}. 

THEOREM  J.l  "Is  5  is  Safe  with  Respect  to  r?"  is  Undecidable 

Theorem:  The  language  Safety  =  { <S.  r>  :  S  is  safe  with  respect  to  r)  is  not  in  D. 

Proof:  We  show  that  Hc  -  { <M>  :  Turing  machine  M  halls  one}  £  Safety  and 
so  Safety  is  not  in  D  because  Hr  isn't.  Define: 

R(<M>)  = 

1.  Make  any  necessary  changes  to  M : 

1.1.  If  M  has  more  than  one  halting  stale,  then  add  a  new  unique  halting 
state  qr  and  add  transitions  that  take  it  from  each  of  its  original 
halting  states  to  qf. 

1.2.  If  M  starts  in  its  halting  state  qh  then  create  a  new  start  state  that 
simply  reads  whatever  symbol  is  under  the  read/writc  head  and 
then  goes  to  q,. 

2.  Build  S: 

2.1.  Construct  an  initial  access  control  matrix  A  that  corresponds  to  M's 
initial  configuration  on  input  e. 

22.  Construct  a  set  of  commands,  as  described  above,  that  correspond 
to  the  transitions  of  M.  v 

3.  Return  <5,  qf>. 
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{R,- 1}  is  a  reduction  from  Hc  to  Safety.  If  Oracle  exists  and  decides  Safety, 
then  C  =  ->Oracle(R(<M>))  decides  H„.  R  and  1  can  be  implemented  as  a  Tur¬ 
ing  machines.  And  C  is  correct.  By  definition^  is  unsafe  with  respect  to  qf  iff  <7/ is 
not  present  in  the  initial  configuration  of  A  and  there  exists  some  sequence  of 
commands  in  S  that  could  result  in  the  initial  configuration  of  S  being  trans¬ 
formed  into  a  new  configuration  in  which  qf  has  leaked,  i.e.,  it  appears  in  some 
cell  of  A.  Since  the  initial  configuration  of  S  corresponds  to  M  being  in  its  initial 
configuration  on  a  blank  tape,  M  does  not  start  in  qf,  and  the  commands  of  S  sim¬ 
ulate  the  moves  of  M.  this  will  happen  iff  M  reaches  stale  tyand  so  halts. Thus: 

•  If  <M>  e  Hr:  M  halts  on  e,  so  {^eventually  appears  in  some  cell  of  A .  S  is  un¬ 
safe  with  respect  to  qf ,  so  Oracle  rejects.  C  accepts. 

•  If  <M.  w>  e  We:  M  does  not  halt  on  iv.  so  qf  never  appears  in  some  cell  of  A. 
S  is  safe  with  respect  to  q^  so  Oracle  accepts.  C  rejects. 

But  no  machine  to  decide  He  can  exist,  so  neither  does  Oracle. 


Does  the  undecidability  of  Safety  mean  that  we  should  give  up  on  proving  that  sys¬ 
tems  are  safe?  No. There  are  restricted  models  that  are  decidable.  And  there  are  specif¬ 
ic  instances  of  even  the  more  general  model  that  can  be  shown  to  have  specific 
properties. This  result  just  means  that  there  is  no  general  solution  to  the  problem. 


Cryptography 

Effective  encryption  systems,  or  the  lack  of  them,  have  changed  the  course  of  history. 
Modern  techniques  for  encoding  sensitive  financial  information  have  enabled  the  ex¬ 
plosion  of  electronic  commerce.  Throughout  history,  the  evolution  of  cryptographic 
systems  has  been  a  game  of  cat  and  mouse:  as  code  breaking  techniques  were  devel¬ 
oped.  new  encoding  methods  had  to  be  developed. 

Before  computers,  any  useful  cryptographic  scheme  was  necessarily  computational¬ 
ly  trivial.  It  had  to  be  because  both  senders  and  receivers  implemented  their  algorithms 
by  hand.  With  the  advent  of  computers,  things  changed.  Senders  and  receivers,  as  well 
as  enemies  and  eavesdroppers,  all  have  access  to  substantial  and  equivalent  computa¬ 
tional  resources.  But  computational  complexity  is  still  important.  What  is  now  required 
is  a  scheme  with  two  properties:  There  must  exist  efficient  algorithms  for  encoding  a 
message  and  for  decoding  it  by  the  intended  recipient.  And  there  must  not  exist  an  ef¬ 
ficient  algorithm  for  decoding  the  message  by  anyone  else.  The  facts  about  computing 
with  prime  numbers,  as  we  describe  them  in  Part  V.  provide  the  basis  for  a  system  that 
possesses  both  of  these  properties. 

In  all  but  the  simplest  cryptographic  systems,  the  algorithms  that  will  be  used,both 
for  encryption  and  decryption,  are  fixed  and  known  to  everyone.  But  those  algo¬ 
rithms  take  two  inputs,  the  text  to  be  encoded  or  decoded  and  a  key.  In  a  symmetric 
key  ^r..an  rccc*vcr  use  the  same  key.  Symmetric  key  systems  suffer  from 

lhe  f;  ?!  keJ  Even 'inn' mTBtfan  be  Sem  Unless  lhere  has  been  some  Prior  a&ree- 
mCnl  on  a  key.  C>  en  ,f  there  has  been  such  an  agreement,  if  the  same  key  is  used  over 
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an  extended  period  of  time,  an  eavesdropper  may  be  able  to  infer  the  key  and  break 
the  code.  For  example,  the  eavesdropper  might  be  able  to  collect  statistics  on  the  fre¬ 
quency  of  letter  combinations  in  the  encoded  text  and  compare  them  to  frequencies 
in  typical  unencoded  texts  in  order  to  infer  relationships  between  the  two.  But  in 
order  to  change  keys,  there  must  be  some  way  to  transmit  new  keys  (securely)  be¬ 
tween  senders  and  receivers. 

Public  key  systems ,  first  introduced  in  the  1970s.  gel  around  all  of  those  problems. 
The  most  widely  used  public  key  system  is  the  RSA  algorithm  |Rivest.  Shamir  and 
Adlcman  1978J.  Following  convention,  we’ll  assume  that  Bob  and  Alice  wish  to  ex¬ 
change  secure  messages  and  that  Eve  is  attempting  to  eavesdrop.  We'll  call  the  original 
(unencrypted  text)  the  plaintext  and  the  encrypted  text  the  ciphertext. The  most  gen¬ 
eral  way  to  describe  RSA.  and  related  algorithms,  is  as  follows: 

Assume  that  Alice  wants  to  send  a  message  to  Bob.  Then: 

1.1.  Bob  chooses  a  key .private,  known  only  to  him. This  key  may  need  to  possess 
some  specific  mathematical  properties  in  order  to  be  effective, so  Bob  may  need 
to  exploit  a  function  choose  that  guarantees  to  return  an  appropriate  private 
key.  Bob  exploits  a  function /to  compute  his  public  key,  public  =  /( private ). 

1.2.  Bob  publishes  public  (either  completely  publicly  or  by  sending  it.  unencrypt¬ 
ed,  to  Alice). 

1.3.  Alice  exploits  Bob’s  public  key  to  compute  ciphertext  =  encrypt(jdaintext, 
public)  and  she  sends  ciphertext  to  Bob. 

1.4.  Bob  exploits  his  private  key  to  compute  plaintext  =  decry pt(ciphtertext , 
private).  In  order  for  this  last  step  to  work,  encrypt  and  decrypt  must  be  de¬ 
signed  so  that  one  is  the  inverse  of  the  other. 

If  there  exist  efficient  algorithms  for  performing  all  four  of  these  steps,  then  Bob 
and  Alice  will  be  able  to  exchange  messages.  Rut  what  about  Eve?  Might  she  also  be 
able  to  decrypt  Alice’s  message?  We  assume  that  Eve  knows  the  algorithms  encrypt 
and  decrypt.  So  she  could  easily  eavesdrop  if  she  could  infer  Bob's  private  key  from  his 
public  one  or  if  she  could  compute  decrypt  without  knowing  Bob’s  private  key.  The 
RSA  algorithm  exploits  the  mathematical  properties  of  modular  arithmetic  arid  the 
computational  properties  of  prime  numbers  to  guarantee  that  Bob  and  Alice  can  per¬ 
form  their  tasks  efficiently  but  Eve  cannot. 

Alice  uses  the  RSA  algorithm  to  send  a  message  to  Bob  as  follows.  Assume  that  all 
messages  are  represented  as  binary  strings.  Let  j  (mod  k).  read  modulo  k .’’  mean  the 
remainder  when  /  is  divided  by  A. The  greatest  common  divisor  of  two  numbers  i  and  j, 
written  gcd(i.j)  is  the  largest  number  that  evenly  divides  both  i  and  /.Then  we  have- 

1.  Bob  constructs  his  public  and  private  keys: 

1.1.  Bob  chooses  two  large  prime  numbers  p  and  </.  From  them,  he  com¬ 
putes  n  =  /?•</. 

12.  Bob  finds  a  value  e  such  that  I  <c</>*r/  and  gcd(e%(p  - 
(«/  -  l))=l.  (In  other  words,  he  finds  an  e  such  that  e  and 
ip  1 )  *  ( <7  1 )  are  relatively  prime.) 
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1.3.  Bob  computes  a  value  d  such  that  d •  e  (mod  (p  —  l)-(r/  —  1))  =  1.  In 
RSA  terminology,  this  value  1 1 .  rather  than  the  original  numbers/)  and 
(/,  is  referred  to  as  Bob's  private  key. 

2.  Bob  publishes  (n.e)  as  his  public  key. 

3.  Alice  breaks  her  message  plaintext  into  segments  such  that  no  segment  corre¬ 
sponds  to  a  binary  number  that  is  larger  than  n.  Then,  for  each  plaintext  seg¬ 
ment.  Alice  computes  ciphertext  —  plaintext'  (mod  »).Then  she  send  ciphertext 
to  Bob. 

4.  Bob  recreates  Alice's  original  message  by  computing  plaintext  =  ciphertext1 
(mod  n). 

The  RSA  algorithm  is  effective  because: 

•  The  functions  encrypt  and  decrypt  are  inverses  of  each  other.  The  proof  follows 

from  Euler's  generalization  of  Fermat's  Little  Theorem  (as  described  in  Section 

30.2.4).  The  generalization  is  called  Euler’s  Totient  Theorem  or  sometimes  just 

Euler's  Theorem  □  . 

•  Bob  can  choose  primes  efficiently  using  the  following  algorithm: 

1.1.  Randomly  choose  two  large  numbers  as  candidates. 

1.2.  Check  the  candidates  to  see  if  they  are  prime. This  can  be  done  efficiently  using 
a  randomized  algorithm,  as  described  in  Section  30.2.4.  There  is  a  tiny  chance 
that  a  nonprime  could  be  thought  to  be  prime,  but  the  probability  of  this  hap¬ 
pening  can  be  reduced  so  that  it  is  substantially  lower  than  the  probability  of  a 
transient  hardware  failure  causing  an  error  in  the  transmission  process. 

1.3.  Repeal  steps  1  and  2  until  two  primes  have  been  chosen.  By  the  Prime 
Number  Theorem  O,  ihe  probability  of  a  number  near  x  being  prime  is 
about  l /In  x  (where  In  is  the  natural  logarithm,  i.e..  the  log  base  2.71828.. . , 
of  .v).  So.  for  example,  suppose  Bob  wants  to  choose  a  1000  bit  number.  The 
probability  of  a  randomly  chosen  number  near  21"00  being  prime  is  about 
1/693.  So  he  may  have  to  try  1000  or  so  times  for  each  of  the  two  numbers 
Ihnl  he  needs. 


•  Bob  can  cheek  f>cd  efficiently  (using  Euclid’s  algorithm,  as  described  in  Example 
27.6),  so  he  can  compute  e. 

•  Bob  can  compute  d  efficiently,  using  an  extension  of  Euclid's  algorithm  that  ex¬ 
ploits  the  quotients  that  it  produces  at  each  step. 

•  Alice  can  implement  encrypt  efficiently.  It  is  not  necessary  to  compute  plaintext" 
and  then  take  its  remainder  mod  n.  Modular  exponentiation  can  be  done  directly 
by  successive  squaring,  as  shown  in  the  example  below. 

•  Similarly.  Bob  can  implement  decrypt  efficiently. 

•  Eve  can’t  recreate  plaintext  because: 


.  she  cm  simply  invert  encrypt  because  modular  exponentiation  isn't  invert 
She  could  try  every  candidate  plain, ex,  and  see  if  she  gets  one  that  prod 
ciphertext,  but  there  are  too  many  of  them  for  this  to  he  feasible 
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•  she  can'l  compute  d  from  n  and  c.  No  efficient  algorithm  for  factoring  n  intop  and 
q  is  known. so  she  can'l  solve  the  problem  that  way.  And  if  there  were  some  other 
way  lor  her  to  compute  il  efficiently,  that  algorithm  could  be  used  as  an  efficient 
algorithm  for  computing  //  and  </.  And  again,  no  such  efficient  algorithm  is  known. 


EXAMPLE  J.1  The  RSA  Algorithm 

We  can  illustrate  the  RSA  algorithm  with  a  simple  message  from  Alice  to  Bob.  In 
practice,  messages  will  be  longer  and  keys  should  be  large  numbers.  We'll  use 
short  ones  here  so  that  it  is  easier  to  see  what  is  going  on. 

1.  Bob  is  expecting  to  receive  messages.  So  he  constructs  his  keys  as  follows: 

1.1.  He  chooses  two  prime  numbers,/)  =  19  and  q  =  31.  He  computes 
n  =  /)  •  q  -  589. 

1.2.  He  finds  an  e  that  has  no  common  divisors  with  IS-  30  =  540.  The  e  he 
selects  is  49. 

1.3.  He  finds  a  value  <1  =  1069.  Notice  that  1069*49  =  52.381.  Bob  needs  to 
assure  that  the  remainder,  when  52.381  is  divided  by  540,  is  1.  And  it  is: 
52,381  =  540*97  +  1.  Bob's  private  key  is  now  1069. 

2.  Bob  publishes  (589, 49)  as  his  public  key. 

3.  Alice  wishes  to  send  the  simple  message  “A." The  ASCII  code  for  A  is  65.  So 
Alice  computes  6541*  (mod  589).  She  does  this  without  actually  computing  65'w. 
Instead,  she  exploits  the  following  two  facts: 

if*i  =  //'•//'. 

(/i*/»)(mod  k)  =  (/;  (mod  A)*w  (mod  A))(mod  k). 

Combining  these,  we  have: 

n1  v/(modA-)  =  (//'(mod  k)  */»'(mod  A))(mod  k). 

So,  to  compute  654\  first  observe  that  49  can  be  expressed  in  binary  as  1 1 0001. 
So  49  =  1  +  16  +  32.  'Thus  654y  =  651  +  3\  The  following  table  lists  the  re¬ 

quired  powers  of  65: 

65 1  (mod  589)  =  65. 

652  (mod  589)  =  4225  (mod  589)  =  102. 

654  (mod  589)  =  102:(mod589)  =  10404  (mod  589)  =  391. 

65s  (mod  589)  *  3912  (mod  589)  =  152881  (mod  589)  =  330. 

65lh  (mod  589)  =  3302  (mod  589)  =  10K9(X)  (mod  589)  =  524. 

65w  (mod  589)  =  524"  (mod  589)  =  274576  (mod  589)  =  102. 
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So  we  have  that:  65J9  (mod  589)  =  65*1  + 164  32>  (mod  589). 

=  (65 1  •  6516  •  6532)(mod  589). 

=  ((651  (mod  589))  •  (6516  (mod  589))  • 

(65s2  (mod  589)))(mod  589). 

=  (65  ■  524  •  102)(mod  589). 

=  ((34060  (mod  589))  •  102)(mod  589). 

=  (487  •  102)(mod  589). 

=  49674  (mod  589). 

=  198. 

Alice  sends  Bob  the  message  198. 

4.  Bob  uses  his  private  key  (1069)  to  recreate  Alice’s  message  by  computing 
1981<16s>  (mod  589).  Using  the  same  process  Alice  used,  he  does  this  efficiently 
and  retrieves  the  message  65. 


For  the  details  of  the  mathematical  claims  that  have  just  been  made,  as  well  as  some 
additional  points  that  should  be  considered  in  choosing  good  keys,  see  any  good  cryp¬ 
tography  book,  for  example  jTrappe  and  Washington  2006]. 


4  Hackers  and  Viruses 

In  this  section,  we’ll  briefly  touch  on  two  other  network  security  issues:The  first  is  virus 
detection.  We’ll  see  that  the  undecidability  results  that  we  proved  in  Chapter  21  tell  us 
that  the  definitive  virus  detector  cannot  exist.  The  second  involves  the  difference  be¬ 
tween  the  average-case  and  worst-case  time  complexity  of  some  important  algorithms. 
This  difference  may  allow  hackers  to  launch  denial  of  service  attacks  and  to  observe 
“secret"  behavior  of  remote  hosts. 


A  \  Virus  Detection 

Given  a  known  computer  virus  V.  consider  the  problem  of  detecting  an  infection  by  V. 
The  most  straightforward  approach  to  solving  this  problem  is  just  to  scan  incoming 
messages  for  the  text  <V'>.  But  viruses  can  easily  evade  this  technique  by  altering 
their  text  in  ways  that  have  no  effect  on  the  computation  that  V  performs.  So,  for  ex¬ 
ample.  source  code  could  be  modified  to  add  blanks  in  meaningless  places  or  to  add 
leading  Os  to  numbers.  Executable  code  could  be  modified  by  adding  jump  instruc¬ 
tions  that  just  jump  to  the  next  instruction. 
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So  the  practical  virus  detection  problem  must  be  stated  as,  "Given  a  known  virus  V 
and  an  input  message  M,  does  M  contain  the  text  of  a  program  that  computes  the  same 
thing  V  computes?”  By  Theorem  21.8,  we  know  that  the  equivalence  question  is  unde- 
cidable  for  Turing  machines.  Using  that  result,  we  showed,  in  Theorem  21.12,  that  the 
equivalence  question  for  arbitrary  programs  is  also  undecidablc.  So  there  exists  no  al¬ 
gorithm  that  can.  in  the  general  case,  decide  whether  a  program  P .  contained  in  some 
message  M ,  is  equivalent  to  a  given  virus  V. 

So  we  can’t  solve  the  virus  problem  by  making  a  list  of  the  known  viruses  and  com¬ 
paring  new  code  to  them.  What  about  going  the  other  way?  Suppose  that,  instead  of 
making  a  list  of  forbidden  operations,  we  allowed  users  to  define  a  "white  list”  of  the 
operations  that  are  to  be  allowed  to  run  on  their  machines. Then  the  job  of  a  virus  filter 
is  to  compare  incoming  code  to  the  operations  on  the  while  list.  Any  code  that  is  equiv¬ 
alent  to  some  allowed  operation  can  be  declared  safe.  But  now  we  have  exactly  the 
same  problem.  No  test  for  equivalence  exists. 


J.4.2  Exploiting  the  Difference  Between  the  Worst  Case  and  the 
Average  Case 

Some  widely  used  algorithms  have  the  property  that  their  worst-case  time  complexity 

is  significantly  different  than  their  average-case  time  complexity.  For  example: 

•  Looking  up  an  entry  in  a  hash  table  may  take,  on  average,  constant  lime.  But  if  all 
the  entries  collide  and  hash  to  the  same  table  location,  the  time  required  becomes 
0(n)  where  n  is  the  number  of  entries  in  the  table. 

•  Looking  up  an  entry  in  a  binary  search  tree  may  lake,  on  average  (9(log  n)  time. 
But  the  tree  may  become  unbalanced.  In  the  worst  case,  it  becomes  a  list  and 
lookup  time  again  becomes  0{n) 

•  Matching  regular  expressions  (often  called  regexes)  of  the  sort  that  are  supported 
by  Unix  utilities  and  programming  languages  like  Perl  may  take  close  to  constant 
time  on  average.  But  these  regex  languages  allow  expressions  that  arc  not  regular 
expressions  in  the  sense  in  which  we  defined  them  in  Chapter  6.  Any  of  the  regular 
expressions  that  we  considered  there  can  be  converted  to  a  finite  state  machine 
that  can  be  guaranteed  to  perform  a  match  in  linear  time.  But  the  added  flexibility 
that  is  provided  in  the  practical  tools  (see  Appendix  O  for  a  description  of  one  of 
them)  means  that  languages  that  are  not  regular  can  be  defined.  So  no  finite  state 
machine  can  be  built  to  accept  them.  In  the  worst  case,  mulching  some  of  these  pat¬ 
terns  may  require  a  backtracking  search  and  so  ihe  time  required  may  be  exponen¬ 
tial  in  the  length  of  the  input  string. 

Hackers  can  exploit  these  facts  Q.  For  example: 

•  One  way  to  launch  a  denial  of  service  attack  against  a  target  site  S  is  to  send  to  S  a 
series  of  messages/requests  that  has  been  crafted  so  that  S  will  exhibit  its  worst-case 
performance.  If  5  was  designed  so  that  it  could  adequately  respond  to  its  traffic  in 
the  average  case,  it  will  no  longer  be  able  to  do  so. 
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One  way  to  get  a  peek  inside  a  site  S  and  observe  properties  that  were  not  intend¬ 
ed  to  be  observable  is  to  time  it.  For  example,  it  is  sometimes  possible  to  observe 
the  time  required  by  S  to  perform  decryption  or  password  checking  and  so  to  infer 
its  private  key  or  a  stored  password. 


APPENDIX  K 

Applications:  Computational  Biology 


Proteins  and  DNA,  the  building  blocks  of  every  living  organism,  can  naturally  be 
described  as  strings.  In  Section  18.2.9,  we  described  an  experiment  in  DNA  com¬ 
puting.  In  it.  synthesized  DNA  molecules  are  treated  as  strings  and  operations 
on  them  are  used  to  solve  a  simple  graph  search  problem.  Of  more  practical  interest,  at 
least  so  far.  is  the  fact  that  significant  sets  of  real  DNA  und  protein  molecules  can  be 
modeled  as  languages.  So,  not  surprisingly,  several  of  the  techniques  (including  FSMs, 
regular  expressions.  HMMs.  and  context-free  grammars)  that  have  been  described  in 
this  book  play  an  important  role  in  modern  computational  biology.  In  Section  5.12.1, 
we  described  the  use  of  an  HMM  to  model  a  problem  in  population  genetics.  In  this 
chapter,  we  will  discuss  several  other  application  areas. 


K.1  A  (Very)  Short  Introduction  to  Molecular  Biology 
and  Genetics 

We  begin  this  chapter  with  a  very  short  introduction  to  the  biological  concepts  that  are 
required  for  an  understanding  of  the  way  that  the  computational  models  we  have  dis¬ 
cussed  are  being  used  by  biologists.  We  skip  many  important  details.  For  more  informa¬ 
tion.  follow  the  Web  links  suggested  here,  or  consult  |  Alberts  el  al  2(X)2]  or  any  good 
modern  text  on  molecular  biology. 


K.1.1  Proteins 

Proteins  are  the  building  blocks  of  living  organisms.  A  protein  is  a  large  molecule  that 
is  composed  of  a  sequence  of  amino  acids.  There  are  20  amino  acids  Q  that  occur  in 
proteins. They  are  shown  in  Table  K.l.  along  with  their  standard,  one-letter  symbols. 

Amino  acids  are  typically  divided  into  classes:  hydrophobic  (h-phob),  hydrophilic 
(h-phil).  and  polar,  with  the  polar  molecules  further  divided  into  positively  (pos)  and 
negatively  (neg)  charged. The  class  to  which  an  amino  acid  helongs  can  have  an  effect 
on  its  function  in  a  protein. Table  K.l  shows  the  class  to  which  each  amino  acid  belongs. 
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rrkble  K.l  Amino  acids. 

/imino  acid 

Sym 

Class 

Amino  Acid 

Sym 

Class 

Pattern  Symbols 

Symbol 

■"/Cranine 

A 

h-phob 

Leucine 

L 

Aspartic  acid  or 

B 

Arginine 

R 

pos 

Lysine 

K 

pos 

Asparagine 

a  cnarnninc 

N 

h-phil 

Methionine 

M 

h-phob 

Glutamine  or 

Z 

Aspartic  acid 

D 

neg 

Phenylalanine 

F 

h-phob 

Glutamic  acid 

Cysteine 

C 

h-phil 

Proline 

P 

h-phob 

any  amino  acid 

X 

glutamine 

Q 

h-phil 

Serine 

S 

h-phil 

Glutamic  acid 

E 

neg 

Threonine 

T 

h-phil 

Glycine 

G 

h-phob 

Tryptophan 

W 

h-phob 

histidine 

H 

pos 

Tyrosine 

Y 

h-phil 

jcnleucine 

I 

h-phob 

Valine 

V 

h-phob 

_ J 

It  also  shows  a  set  of  symbols  that  are  sometimes  used  in  specifying  patterns  of  amino 
acid  sequences. 

Amino  acids  share  a  common  chemical  structure.  Each  contains  a  carbon  atom,  to 
which  is  attached  an  amino  group  (NH2),  a  carboxyl  group  (COOH),  and  a  side  chain, 
also  called  the  functional  group.  Amino  acids  contain  different  functional  groups,  and  it 
is  that  part  of  the  molecule  that  causes  them  to  behave  differently.  Amino  acids  com¬ 
bine  to  form  proteins  when  the  amino  group  of  one  amino  acid  molecule  bonds  with 
the  carboxyl  group  of  another,  releasing  one  molecule  of  water  (H20)  and  forming  a 
bond  called  a  peptide  linkage.  For  example,  three  amino  acid  molecules,  joined  by  two 
peptide  linkages  would  look  roughly  as  shown  in  Figure  K.l  (ignoring  the  details  of 
what  the  peptide  linkages,  shown  as  ovals,  actually  look  like,  and  letting  the  ?’s  repre¬ 
sent  the  functional  groups  of  each  of  the  amino  acids). 

The  part  of  each  amino  acid  that  remains  after  peptide  bonds  have  been  formed  is 
called  an  amino  acid  residue.  So,  to  be  exact,  a  protein  is  a  sequence  of  amino  acid 
residues.  For  simplicity,  however,  proteins  are  often  described  simply  as  sequences  of 
amino  acids  and  we  will  follow  that  convention. 

The  sequence  of  amino  acids  that  makes  up  a  protein  is  called  the  protein's  primary 
structure.  If  each  amino  acid  is  assigned  a  distinct  symbol,  as  shown  in  the  table  above, 
then  the  primary  structure  of  a  protein  can  be  described  as  a  string.  So,  for  example,  the 
string  QTS  corresponds  to  the  sequence  Glutamine. Threonine,  Serine.  Notice  that  the 
two  ends  of  the  sequence  illustrated  below  are  different.  There  is  an  amino  (NH2) 
group  on  one  end  and  a  carboxyl  (COOH )  group  on  the  other.  If  we  adopt  the  conven¬ 
tion  that  a  protein  molecule  will  be  described  with  its  amino  group  on  the  left,  then 
there  is  a  unique  string  that  corresponds  to  each  protein. 

Proteins  vary  in  size.  The  smallest  ones  may  contain  fewer  than  a  hundred  amino 
acids.  The  largest  may  contain  thousands.  A  typical  protein  mav  contain  between  300 
and  500  of  them. 


^  <COQH  Nij£>-C— COOH 


FIGURE  K.1  Putting  amino 
acids  together  to  form  proteins. 
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While  a  protein  can  be  described  as  a  two-dimensional  (primary)  structure  that  is 
simply  a  chain  of  amino  acids,  every  physical  protein  also  has  a  three-dimensional 
structure  that  is  formed  as  the  amino  acid  chain  folds  and  wraps  around  itself.  This 
three-dimensional  shape  is  called  the  protein’s  secondary  structure  Q.The  secondary 
structure  of  a  protein  is  determined  by  its  primary  structure,  as  well  as  by  environmen¬ 
tal  factors,  such  as  temperature.  Each  protein  has  a  natural  secondary  structure,  and  it 
must  exhibit  that  structure  in  order  to  perform  its  function  within  a  living  organism. 
Sometimes,  when  the  structure  is  broken,  as  for  example,  by  changing  the  temperature, 
it  can  be  rebuilt  if  the  natural  environment  is  restored.  Sometimes,  however,  it  cannot 
For  example,  the  proteins  in  a  cooked  egg  cannot  be  “uncooked"  and  returned  to  their 
natural  structure.  Proteins  that  have  formed  abnormal  secondary  structures  will  typi¬ 
cally  behave  abnormally.  For  example,  it  is  believed  that  an  accumulation  of  abnormally 
shaped  proteins  called  prions  is  responsible  for  causing  mad  cow  disease. 

The  work  of  a  protein  is  done  at  some  number  of  specific  locations  called  functional 
sites.  It  is  there  that  other  molecules  can  attach  to  the  protein.  In  order  for  a  protein  to  do 
its  job.  it  must  be  folded  so  that  its  functional  sites  are  exposed  and  the  sites  themselves 
must  exist  (i.e.,  contain  a  sequence  of  amino  acids  with  the  chemical  properties  that  are 
necessary  for  whatever  job  the  site  is  required  to  perform).  But  it  turns  out  that  some  vari¬ 
ations  in  the  exact  amino  acid  sequence  that  makes  up  a  protein  molecule  can  be  tolerated 
without  affecting  the  ability  of  the  protein  to  function  correctly.  Such  variation  can  be  in¬ 
troduced  by  mutations.  So  if  we  examine  a  particular  protein,  for  example  the  blood  pro¬ 
tein  hemoglobin,  in  multiple  organisms,  we  will  find  similar  but  not  identical  molecules. 

The  similarity  among  related  molecules  may  be  able  to  be  described  as  a  set  of  mo¬ 
tifs,  where  a  motif  is  a  relatively  short  region  that  has  been  conserved  (left  unchanged) 
by  the  evolutionary  process.  If  the  same  motif  occurs  in  very  different  organisms,  then 
it  is  likely  that  it  is  significant,  in  the  sense  that  it  corresponds  to  a  sequence  whose 
structure  is  necessary  in  order  for  the  protein  to  function  properly. 

K.1.2  DNA 

DNA  is  the  blueprint  for  living  organisms.  Each  molecule  of  DNA  is  composed  of  two 
strands,  held  together  by  weak  hydrogen  bonds  and  arranged  as  a  helix.  Each  of  the 
strands  is  made  up  of  a  sequence  of  nucleotides,  each  of  which  in  turn  is  composed  of 
three  parts:  deoxyribose  (a  sugar),  a  phosphate  group,  and  one  of  four  bases,  shown  in 
Tahle  K.2,  along  with  the  symbols  that  are  used  to  represent  them. 

The  Tour  bases  are  divided  into  two  chemical  classes,  also  shown  in  the  table  below. 
Each  base  has  a  complement,  which  is  the  other  base  in  the  same  class.  So  A  and  T  are 
complements,  as  are  C  and  G.  When  a  double  strand  of  DNA  is  examined  as  a  sequence 
of  base  pairs  (one  nucleotide  from  each  strand),  each  base  is  paired  with  its  comple¬ 
ment.  Figure  K.2  shows  a  fragment  of  a  DNA  molecule. 


Tahle  K.2  lbe  nucleotides  that  make  up  DNA. 

Base 

Symbol 

_  i?n' _ 

Adenine 

purine 

Thymine 

purine 

Cytosine 

pyrimidine 

Guanine 

EnSpi 

pyrimidine 
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FIGURE  K.2  ADNA 
double  helix. 


The  sequence  of  base  pairs  in  the  DNA  of  an  individual  is  called  the  individual's 
genome.  Since  the  DNA  of  individuals  from  the  same  species  is  almost  identical,  the 
genome  of  an  individual  can  be  considered  to  be  representative  of  the  species,  so  we 
can  also  talk  about  the  genome  of  a  species. 

DNA  molecules  encode  the  program  (set  of  instructions)  that  an  organism  uses  to 
manufacture  the  proteins  that  it  needs.  Since  a  protein  is  a  sequence  of  amino  acids,  the 
program  that  builds  it  can  be  encoded  as  a  sequence  of  subprograms,  one  for  each 
amino  acid  in  the  sequence.  There  are  20  amino  acids  and  only  four  different  nu¬ 
cleotides,  so  it  lakes  a  sequence  of  three  nucleotides  to  specify  a  single  amino  acid.  Such 
a  sequence  of  three  nucleotides  is  called  a  codon.  There  are  43  =  64  different  codons 
and  only  20  amino  acids,  so  there  is  redundancy  in  the  way  that  amino  acids  are  speci¬ 
fied.  Some  amino  acids  are  described  by  more  than  one  codon.  So,  in  particular,  note 
that  some  changes  to  a  codon  will  have  no  effect  on  the  protein  that  the  codon  defines. 

A  sequence  of  codons  that  contains  the  blueprint  for  a  protein  or  some  other  important 
molecule  (such  as  RNA)  is  called  a  gene  and  is  said  to  code  for  that  protein  or  molecule. 

Hie  DNA  of  an  individual  is  organized  as  a  set  of  double-helix  strands  called 
chromosomes.lhe  human  genome,  for  example,  is  arranged  into  46  chromosomes.  Sex¬ 
ually  reproducing  organisms  are  diploid ,  meaning  that  the  chromosomes  in  all  but  the 
egg  and  sperm  cells  occur  in  pairs.  Each  organism  inherits  one  member  of  each  pair 
from  each  parent.  Generally  both  members  of  the  pair  contain  the  same  sequence  of 
genes,  although  there  may  be  exceptions.  For  example,  humans  have  23  chromosome 
pairs,  of  which  22  are  matching.  In  addition,  females  have  a  pair  of  X  chromosomes, 
while  males  have  one  X  and  one  Y. 


Differences  between  individuals  within  a  species  are  the  result  of  differences  in  their 
genes  (as  well  as  differences  in  environmental  factors).  When  a  gene  occurs  in  more  than 
one  form,  those  lorms  are  called  alleles.  So,  for  example  in  humans,  there  are  three  alle¬ 
les  (called  A,  B,  and  O)  of  a  gene  that  codes  for  an  important  blood  protein.  Each  person 
possesses  two  genes  for  this  blood  protein  (one  from  each  parent). Those  two  genes  may 
be  the  same  or  they  may  be  different.  So  each  person’s  genotype  (the  actual  genes  they 
possess)  for  this  trait  must  be  one  of  the  six  values:  AA,  AB,  AO,  BB,BO.and  OO.  (Order 
joesn  t  matter.)  Individuals  with  two  identical  genes,  (i.e.,  two  genes  that  correspond  to 
the  same  allele)  are  called  homozygous  with  respect  to  that  gene.  Individuals  with  two 
different  genes  are  called  heterozygous  with  respect  to  that  gene 

^e  observable  trails  of  an  individual  represent  its  phenotype.  The  phenotype  is 
determine  iy  t  le  genotype  in  a  variety  of  ways.  Sometimes  a  single  gene  is  responsible 
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lor  determining  a  trait.  Sometimes  several  genes  play  a  role.  Sometimes  one  allele  is 
dominant  while  others  are  recessive.  In  that  case,  in  heterozygous  individuals,  the  domi¬ 
nant  allele  is  expressed  and  determines  the  observable  trait,  while  the  recessive  allele 
has  no  affect,  although  it  can  be  passed  on  to  offspring.  For  example,  people  have  a  gene 
that  determines  whether  their  earlobes  will  be  attached  to  their  skull  or  hang  freely  The 
free  earlobes  allele  is  dominant  and  the  attached  earlobes  allele  is  recessive.  So  anyone 
with  attached  earlobes  must  be  homozygous  and  possess  two  genes  for  attached  ear¬ 
lobes.  Fortunately,  many  disease-causing  alleles,  for  example  the  one  that  causes  cystic  fi¬ 
brosis.  are  recessive.  But  some,  for  example  the  allele  that  causes  Huntington’s  disease,  is 
dominant.  Not  all  observed  traits  are  determined  by  the  simple  dominant/recessive 
model.  For  example,  in  the  case  of  the  ABO  blood  protein,  none  of  the  alleles  is  domi¬ 
nant.  Any  individual  who  possesses  an  A  gene  will  have  red  blood  cells  with  antigen  A 
on  the  surface.  An  individual  with  a  B  gene  will  have  red  blood  cells  with  antigen  B  on 
the  surface.  A  person  with  one  of  each  will  produce  both  antigen  A  and  antigen  B.  A  per¬ 
son  with  neither  the  A  nor  the  B  gene  (i.c.,  someone  who  is  homozygous  with  two  O 
genes)  will  produce  neither  antigen.  So  there  are  four  phenotypes:  A  (corresponding  to 
the  genotypes  AA  and  AO).  B  (corresponding  to  the  genotypes  BB  and  BO),  AB  (cor¬ 
responding  to  the  genotype  AB).  and  O  (corresponding  to  the  genotype  OO). 

While  genes  are  the  key  to  the  function  of  DNA,  most  of  the  DNA  that  is  present  in 
the  chromosomes  of  living  creatures  codes  for  nothing.  For  example,  about  97%  of  the 
human  genome  is  noncoding.  A  small  amount  of  that  noncoding  DNA  appears  to  serve 
some  function,  for  example  in  regulating  the  activity  of  the  coding  regions.  But  we  do 
not  know  what,  if  any.  function  is  served  by  most  noncoding  DNA.  Noncoding  DNA  is 
important  when  we  compare  DNA  sequences  across  related  organisms  since  muta¬ 
tions  in  nonessential  DNA  can  occur  without  affecting  the  fitness  of  the  organism.  So. 
while  functional  DNA  sequences  are  more  likely  to  be  conserved  across  individuals 
within  a  species  and  across  related  species,  other  segments  may  vary,  sometimes  sub¬ 
stantially.  These  variations  make  it  possible  to  do  DNA  testing  to  identify  individuals. 
They  can  also  be  used  to  infer  genetic  closeness  of  species: The  more  changes  there  are 
in  the  DNA  sequences,  the  longer  ago  the  species  shared  a  common  ancestor. 

K.1.3  RNA 

RNA  is  chemically  very  similar  to  a  single  strand  of  DNA. There  are  two  important 
differences: 

•  The  four  bases  that  are  present  in  RNA  nucleotides  are  adenine  (A),  guanine  (G), 
cytosine  (C)  and  uracil  (U).The  first  three  are  also  present  in  DNA. The  last,  uracil', 
occurs  in  place  of  thymine.  C  and  G  are  complementary  (just  as  they  are  in  DNA). 
A  and  U  are  also  complementary. 

•  RNA  nucleotides  contain  a  different  sugar  molecule  (ribose)  than  do  those  in  DNA 
In  a  living  cell,  RNA  plays  several  important  roles,  including: 

•  Messenger  RNA  transports  the  encoding  of  a  protein  from  the  cell’s  nucleus 
(where  the  DNA  is)  to  the  site  of  protein  synthesis. 

•  Transfer  RNA  transports  individual  amino  acid  molecules  to  the  building  site  dur¬ 
ing  protein  synthesis. 
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FIGURE  IC3  An  RNA  molecule  folding. 


•  Ribosomal  RNA  makes  up  a  substantial  part  of  the  ribosomes,  the  cell’s  protein 
factories, 

•  Catalytic  RNA  functions  like  an  enzyme  and  is  involved  in  a  variety  of  cell  functions. 

RNA  molecules,  unlike  DNA  ones,  do  not  form  double  strands.  But  a  single  RNA 
strand  does  fold  around  itself,  creating  a  secondary  structure  that  is  important  to  the 
function  of  the  molecule.  In  particular,  if  two  subsequences  that  contain  complementary 
bases  fold  so  that  they  align  next  to  each  other,  they  form  hydrogen-bonded  base  pairs 
in  much  the  same  way  that  two  DNA  strands  do.  We  call  these  bonded  subsequences 
stems.  The  unaligned  subsequences  between  the  stems  will  then  form  loops,  and  un¬ 
aligned  subsequences  at  the  end  will  simply  hang  out  as  tails. 

Consider,  for  example,  the  RNA  sequence  AACCCACUGUAAAUCUCGUCCCACUCC.  It  is 
likely  to  fold  to  form  the  structure  shown  in  Figure  K.3.  The  lines  indicate  hydrogen 
bonds  between  complementary  bases  in  a  stem.  In  this  example,  a  stem  containing  six 
base  pairs  has  formed.  Stems  generally  arrange  themselves  in  a  helix,  in  much  the  same 
way  that  the  paired  strands  in  a  DNA  molecule  do. 


Genetics  and  Evolution 

The  genomes  of  living  creatures  are  under  constant  evolutionary  pressure  from  three 
natural  stochastic  processes: 


mutation,  which  occurs  when  DNA  is  imperfectly  copied  during  reproduction. 

natural  selection,  which  occurs  when  fitter  (i.e.,  better-adapted)  individuals  have 
higher  survival  and  reproduction  rates  than  their  less  well-adapted  cousins,  and 

genetic  drift  0,  which  occurs  when  the  relative  frequencies  of  competing  alleles 
changes,  either  as  a  result  of  sampling  bias  or  as  the  result  of  random  (i.e.,  not  based 
on  fitness)  events.  Sampling  bias  is  particularly  likely  to  occur  in  small  populations 
(and  many  species  exist  in  relatively  small,  isolated  populations).  Suppose  that 
there  exists  a  gene  with  two  alleles,  a  and  A,  that  occur  with  frequency  .5  each.  As¬ 
suming  sexual  reproduction,  one  gene  from  each  parent  is  passed  on  to  the  next 
generation.  Fhat  gene  is  chosen  at  random  from  the  two  that  the  parent  possesses. 
It  is  likely  that  in  the  next  generation,  the  relative  frequencies  of  the  two  alleles  will 
not  be  exactly  .5  but  will  instead  be  something  like  .4955  or  .5045.  At  the  next  gen¬ 
eration,  it  could  become  slightly  more  weighted.  And  so  forth.  If  it  ever  reaches  0/1 . 
there  is  no  going  ac  .  Genetic  drift  also  occurs  when  some  individuals  are  selected 
™ore  because  they  are  lucky  than  because  they  are  fit.  Suppose,  for  example,  that 

1  u  m  r*  •  IT  ■  r°uU^  a  portion  of  the  natural  habitat  of  a  population 
and  all  the  individuals  in  the  fire-ravaged  area  are  destroyed.  A  small  group  that 
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happened  to  be  lucky  enough  to  be  outside  the  burned  area  is  left  to  reproduce. 
There  is  no  guarantee  that  the  distribution  of  alleles  in  that  population  is  identical 
to  the  distribution  in  the  original  larger  group. 

K.1.5  Summary 

The  computational  theory  that  wc  have  described  in  this  book  is  well-suited  to  biolog¬ 
ical  applications  for  two  important  reasons*- 

•  proteins.  DNA.  and  RN  A  can  straightforwardly  be  represented  as  strings. 

•  naturally  occurring  stochastic  processes  apply  to  them. 


K.2  The  Sequence  Matching  Problem 

There  now  exist  sophisticated  techniques  for  analysing  DNA  and  protein  molecules 
and  for  determining  the  sequence  of  amino  acids  or  nucleotides  that  they  contain. This 
process  is  called  mapping  or  sequencing.  In  2003,  the  human  genome  project  S3  com¬ 
pleted  its  goals  of  describing  the  approximately  3  billion  base  pairs  and  identifying  the 
approximately  30,000  genes  that  make  up  human  DNA. The  genomes  of  other  organ¬ 
isms.  ranging  from  the  bacterium  E.  coli  to  chimpanzees,  have  also  been  mapped,  as 
have  many  of  the  proteins  that  are  found  throughout  nature. 

Consider  a  set  of  related  organisms.  They  will  share  many  of  their  proteins,  as  well  as 
much  of  their  DNA  and  RNA.  But,  when  molecules  are  looked  at  as  sequences  of 
amino  acids  or  nucleotides,  they  will  be  similar  but  not  identical  for  two  reasons: 

•  Mutations  can  occur  during  reproduction. Those  changes  will,  in  turn,  cause  changes 
to  the  proteins  that  make  up  the  organism,  resulting  in  individual  differences  within  a 
species,  as  well  as  the  more  significant  differences  that  can  be  observed  across  species. 

•  Many  mutations  have  no  effect  on  the  function  of  the  protein.  DNA.  or  RNA  mole¬ 
cules  that  they  modify.  So  they  may  be  passed  on.  without  effect,  to  the  descendants 
of  the  original  organism  in  which  they  occurred. Thus  even  very  similar  organisms 
may  possess  different  DNA  and  different  proteins. 

Proteins  are  very  long  molecules  and  DNA  strands  are  even  longer.  So,  to  analvze 
them,  it  makes  sense  to  break  them  apart  into  shorter  (hopefully  significant)  subse¬ 
quences.  There  are  a  variety  of  techniques  available  for  doing  this.  The  important  thing 
about  these  techniques  is  that  they  cut  up  long  sequences  in  predictable  ways.  For  exam¬ 
ple.  there  is  a  family  of  enzymes  called  restriction  enzymes,  each  of  which  cuts  double- 
stranded  DNA  only  in  places  that  contain  a  particular  nucleotide  sequence.  So  if  similar 
DNA  molecules  arc  subjected  to  the  same  processes,  they  will  produce  similar  sequences. 

Comparing  DNA.  RNA,  or  protein  sequences  can  help  to  answer  the  following 
kinds  of  questions: 

•  Given  an  organism  (from  which  we  have  a  DNA.  RNA.  or  protein  sequence),  how 
is  it  related  to  other  organisms?  The  closer  the  match  between  the  sequences,  the 
more  closely  related  the  organisms  are  likely  to  be. 
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•  Given  a  DNA.  RNA,  or  protein  sequence,  what  function  does  it  perform?  What 
parts  of  the  sequence  are  important  in  the  performance  of  the  function?  If  the  se¬ 
quence  is  very  similar  to  sequences  in  other  organisms,  or  even  other  molecules  in 
the  same  organism,  and  we  already  know  what  those  molecules  do.  we  may  have  an 
answer.  Or,  if  we  find  similar  sequences  in  other  molecules,  perhaps  it  is  possible  to 
figure  out  what  all  of  them  do  by  looking  for  similarities  in  what  we  know  about  all 
of  them. 

•  Given  a  DNA.  RNA.  or  protein  sequence  from  a  diseased  organism  and  one  from  a 
healthy  organism,  what  is  the  difference?  This  may  help  us  understand  the  cause  or 
a  potential  treatment  for  the  disease. 

There  now  exist  large  databases  5  of  known  protein,  DNA,  and  RNA  sequences 
from  known  organisms.  But  a  substantial  computational  problem  remains:  comparing 
sequences  to  each  other.  Generally  the  goal  of  such  comparisons  is  to  find  related  se¬ 
quences  that  are  evolutionarily  as  close  as  possible  to  each  other.  In  other  words,  we 
want  to  find  sequences  from  which  the  current  sequence  could  have  evolved  with  the 
smallest  number  of  mutations.  Because  the  sequences  can  be  described  as  strings,  an¬ 
other  way  to  describe  the  problem  is  as  string  matching.  Because  related  sequences  are 
not  necessarily  identical,  the  problem  is  one  of  approximate  string  matching.  For  a 
good  introduction  to  the  variety  of  computational  techniques  that  have  been  devel¬ 
oped  to  solve  this  problem,  see  [Durbin  et  al.  1998]. 

All  of  these  techniques  rely  on  the  notion  of  alignment.  Two  or  more  sequences  are 
aligned  if  they  are  arranged  in  a  way  that  minimizes  some  notion  of  evolutionary  dis¬ 
tance.  One  strategy,  for  example,  is  to  maximize  the  number  of  positions  that  contain 
the  same  amino  acid  or  nucleotide.  But  each  alignment  algorithm  exploits  its  own  spe¬ 
cific  measure  of  closeness.  For  example,  some  rate  pairs  of  amino  acids  as  either  identi¬ 
cal  or  different,  while  others  consider  amino  acids  in  the  same  class  (as  shown  in  the 
amino  acid  table  we  presented  above)  to  be  more  similar  than  ones  in  different  classes. 
Some  alignment  algorithms  are  global:  They  try  to  align  entire  sequences  so  that  as 
many  symbols  as  possible  match.  Other  algorithms  are  local:  They  try  to  find  smaller 
subsequences  that  match  exactly  or  almost  exactly,  even  if  the  rest  of  the  alignment 
produced  by  that  match  isn't  very  good. 


EXAMPLE  K.1  Aligning  Amino  Acid  Sequences 
Consider  the  four  amino  acid  sequences: 

AGHTYWDNR ,  AGHDTYENNRY,  YPAGQDTYWNN ,  AGHDTTYWNN 
In  this  simple  case,  a  straightforward  alignment  is: 

AGH  T  YWDNR 
AGHDT  YENNRY 
YPAGQDT  YWNN 
AGHDTTYWNN 
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Similar  (but  not  necessarily  identical)  sequences  can  be  aligned  as  shown  above. 
Such  sequences  are  probably  related  (both  genetically  and  functionally).  But  they  may 
differ  as  a  result  of  three  kinds  of  mutations: 

•  substitution:  For  example,  in  the  third  sequence  above.  Q  has  been  substituted  for  H. 
In  evaluating  the  closeness  of  an  alignment,  some  substitutions  arc  typically  as¬ 
signed  higher  distances  (and  so  alignments  that  include  them  are  ranked  as  less 
close)  than  others.  Sometimes  the  distance  is  based  on  chemical  and  structural  prop¬ 
erties  of  the  corresponding  amino  acids  or  nucleotides.  But  in  one  common  scheme, 
the  PAM  O  family  of  distance  matrices  for  amino  acids,  the  distances  are  based  on 
the  probability  that  one  amino  will  replace  another  during  evolution. The  Q/H  substi¬ 
tution  in  the  third  sequence  has  very  low  evolutionary  distance  (i.e„  it  is  very  likely 
to  occur),  while  the  E/W  substitution  in  the  second  sequence  has  a  high  one. 

•  deletion:  For  example,  the  D  is  missing  from  the  first  sequence  above. 

•  insertion:  For  example,  an  extra  T  has  been  inserted  in  the  fourth  sequence  above. 

In  the  rest  of  this  section,  we  consider  a  collection  of  techniques  for  solving  the  fol¬ 
lowing  problems: 

1.  Given  two  sequences,  find  the  best  alignment  between  them. 

2.  Given  one  new  sequence  and  a  database  of  known  sequences,  find  the  known 
ones  that  are  most  likely  to  be  related  to  the  new  one. 

3.  Given  one  or  more  patterns  that  describe  related  families  of  sequences,  compare 
the  pattern  to  an  individual  sequence  or  to  a  database  of  sequences  and  find  close 
matches. 


K.3  DNA  and  Protein  Sequence  Matching  Using  the  Tools 
of  Regular  Languages 

We  begin  by  describing  three  techniques  that  can  be  used  to  solve  alignment  problems 
involving  proteins  and  DNA. 

•  Deterministic  FSMs  are  used  in  BLAST,  a  very  fast  query  engine  that  operates  on 
huge  databases  and  solves  problem  2  above. 

•  Regular  expressions  can  be  used  to  specify  motifs,  or  patterns  that  describe  a  related 
set  of  sequences. These  patterns  are  used  to  solve  problem  3  above. 

•  Hidden  Markov  models  can  be  used  both  for  pairwise  matching  (problem  1  above), 
as  well  as  to  model  known  families  of  sequences  and  to  compute  the  probability 
that  other  sequences  are  related  to  that  family  (problem  3  above). 

All  of  these  techniques  make  the  assumption  that  the  mutations  that  caused  the  vari¬ 
ation  among  related  sequences  occurred,  for  the  most  part,  independently  of  each  other 
This  independence  assumption  makes  it  possible  to  rely  on  techniques  that  are  based  on 
models  of  regular  languages.  In  Section  K.4.  we  will  consider  phenomena,  such  as  se¬ 
quence  evolution  and  secondary  structure  prediction  of  RNA,  in  which  distant  parts  of 
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a  sequence  interact  with  each  other.  To  solve  such  problems  it  will  be  necessary  to  use 
more  powerful  formal  structures,  such  as  stochastic  context-free  grammars. 


3.1  Finite  State  Machines  in  BLAST 

The  first  problem  we  will  consider  is  the  following:  Given  a  protein  or  DNA  sequence, 
find  other  sequences  that  have  high-scoring  local  matches  with  it.  The  BLAST  S 
[  Altschul,  Gish,  Miller.  Myers,  and  Lipman  1990]  search  engine  is  widely  used  to  solve 
this  problem.  There  are  several  versions  of  BLAST.  Some  of  them  now  do  global  as 
well  as  local  matches.  The  BLAST  family  of  search  engines  uses  a  variety  of  heuristic 
techniques  to  search  huge  databases  and  find  the  sequences  that  are  most  likely  to  be 
biologically  significant  matches  for  the  query  string. 

The  core  of  the  original  BLAST  system  is  a  three  step  process: 

1.  Select  a  reasonably  small  number  w  (usually  between  4  and  20).  Examine  the 
query  string  and  select  the  substrings  of  length  w  that  are  good  candidates  for 
producing  local  matches  with  it. 

2.  Using  the  set  of  substrings  found  in  step  1  (called  words),  build  a  DFSM  that  can 
be  used  to  scan  a  database  of  known  sequences  and  identify  those  sequences  that 
have  high-scoring  local  matches  with  one  of  the  words.  Run  the  resulting  DFSM 
against  the  database  and  find  the  sequences  that  match. 

3.  Examine  the  matches  found  in  step  2  and  see  if  it  is  possible  to  extend  any  of 
them  to  build  longer  matching  sequences.  Assign  scores  to  all  of  those  extended 
matches  and  return  those  sequences  with  a  local  match  score  above  some  prede¬ 
termined  cutoff. 

The  implementation  of  step  2  can  take  advantage  of  the  observation  that  we  made 
at  the  end  of  Section  6.2.4:  Given  a  finite  set  of  keywords  K ,  it  is  straightforward  to 
build  a  DFSM  that  matches  all  instances  of  elements  of  K.  If  we  had  to  view  the  set  K 
as  an  arbitrary  regular  expression,  build  an  NDFSM.  and  then  convert  it  to  a  deter¬ 
ministic  one,  the  construction  step  could  take  time  that  grows  exponentially  in  the  size 
of  K.  But  a  variant  oi  the  algorithm  buildkeyword FSM ,  which  we  described  in 
Section  6.2.4.  builds  the  required  deterministic  FSM  in  time  that  is  proportional  to  the 
sum  of  the  lengths  of  the  words  in  K.  We  need  a  variant  of  build  keyword  FSM  because 
we  actually  need  a  finite  stale  transducer,  not  simply  a  recognizer. The  job  of  the  trans¬ 
ducer  is  to  output  each  instance  of  a  match  when  it  finds  it.  In  experiments  that  were 
done  in  the  early  stages  of  the  implementation  of  BLAST,  other  techniques  for  imple¬ 
menting  step  2  were  tried,  but  the  FSM  approach  yielded  the  highest  performance  in 
searching  large  databases.  Some  later  versions  of  BLAST  now  usJother  techniques. but 
some  retain  the  original  FSM  approach. 


2.2  Regular  Expressions  Specify  Protein  Motifs 

Given  a  eollection  of  proteins  that  are  already  known  to  be  related,  and  a  sequence 
alignment  of  them,  we  can  define  a  motif,  a  conserved  sequence  of  amino  acids,  that 
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we  may  know  or  hypothesize  corresponds  to  some  function  of  the  proteins  in  which 
these  sequences  occur. 


EXAMPLE  K.2  Detecting  Motifs 

Suppose  that  we  have  the  following  fragments  (which  are  smaller  than  the  ones 
that  are  generally  considered,  but  they  illustrate  the  idea): 


E 

S  G  H 

D  T 

Y  Y  N  K  N  R 

M 

D  T  T  T  T 

T  S  W  Q  S 

R  G  S 

D  T  T  T 

P  D  M  T 

AGP 

T  T 

W  R  N  T 

K 

Q  G  E 

D  T  T 

D  G  M  T 

A  G  M 

D  T  T 

KPQT 

M 

D  T 

R  W  N  S 

1  2 

3  4 

5  6  7  8  9 

This  example  appears  to  contain  a  short  motif,  shown  in  bold.  Notice  that 
small  variations  (in  particular,  things  that  we  believe  do  not  affect  function)  may 
be  allowed. 


Once  we  have  defined  a  motif,  we  would  like  to  search  to  find  occurrences  of  it  in 
other  protein  sequences.  So  we  need  a  notation  in  w  hich  to  describe  the  motif.  Regular 
expressions  are  often  used  to  do  this.  Not  all  systems  use  exactly  the  same  regular  ex¬ 
pression  syntax,  but  most  G  use  something  similar  to  the  syntax  of  Perl  (Appendix  O) 
or  Python. 


EXAMPLE  K.3  Describing  Motifs  with  Regular  Expressions 

Continuing  with  the  example  we  started  above,  we  see  that: 

•  The  motif  begins  with  G  or  with  M.  but  if  it  starts  with  M  then  the  M  is  the  first 
element  of  the  sequence. 

•  Position  3  is  D.  but  it  is  optional. 

•  Position  4  is  T.  which  may  be  replicated  some  number  of  times. 

•  Position  7  can  be  anything  except  P  (purine).  It  would  be  hard  to  hypothesize 
this  from  this  small  a  sample,  but  such  an  observation  could  be  part  of  a  motif 
that  was  derived  from  a  larger  sample. so  we  include  it  here  to  illustrate  the  idea 

•  Position  K  must  be  K.  S  or  T. 
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EXAMPLE  K.3  ( Continued) 

To  describe  this  motif  with  a  Perl/Python-style  regular  expression,  we  would 
write: 

(G|(AM))  X  D?  T  T*  X  X  [AP]  [KST] 

X  is  a  shorthand  for  [ARND . . .  ].  so  it  matches  any  amino  acid.  The  character  a 
means  the  left  end  of  the  sequence  if  it  occurs  outside  [J.  Inside  [],  it  means  not 
one  of  the  characters  listed. 


The  exact  syntax  that  is  used  to  specify  these  patterns  varies  across  systems.  For  ex¬ 
ample,  Prosite  °.  a  database  of  motifs,  uses  a  different  format,  but  the  structure  of  the 
pattern  expressions  is  the  same. 


.3  HMMs  for  Sequence  Matching 

While  regular  expressions  are  a  good  way  to  define  motifs  that  are  derived  from  a  set 
of  very  similar  sequences,  the  technique  degrades  when  the  set  of  sequences  becomes 
fairly  diverse.  As  new  sequences  are  added  to  the  set,  more  and  more  special  cases 
must  be  included  in  the  regular  expression,  and  there  is  no  way  to  indicate  that  some  of 
them  are  rare,  truly  special  cases,  while  other  variants  are  found  in  almost  all  sequences 
in  the  family.  If  this  process  continues  long  enough,  the  motif  will  become  so  general 
that  it  will  match  many  unrelated  sequences.  And  there  is  no  notion  of, “close  match” 
as  opposed  to,  “This  one  is  a  stretch.” 

To  solve  this  problem,  we  need  a  new  technique  for  describing  related  families  of  se¬ 
quences.  In  particular,  we  need  a  technique  that  can  capture  the  variations  among  the 
members  of  the  family  and  that  records  the  probabilities  associated  with  those  varia¬ 
tions.  We  ll  call  such  a  representation  a  profile.  Given  a  set  of  profiles  describing  a  set  of 
sequence  families,  we  would  like  to  be  able  to  consider  a  new  sequence  and  compute  the 
probability  that  it  could  have  been  derived  from  each  of  the  known  families.  We’ll  then 
hypothesize  that  it  belongs  to  the  family  with  the  highest  probability  of  generating  it. 

To  build  a  profile  lor  a  family,  we'll  assume  an  initial  ancestral  sequence  and  then  de¬ 
scribe  the  mutations  that  produced  the  other  members  of  the  family.  We’ll  associate  a 
probability  with  each  such  mutation. Then,  given  a  new  sequence  s  and  a  profile  p ,  we  can 
compute  the  probability  that  s  evolved  from  the  ancestral  sequence  of  p  and  thus  belongs 
to  the  family  that  is  descended  from  that  sequence.  So.  while  the  attempt  to  match  a  reg¬ 
ular  expression  against  a  sequence  produces  one  of  only  two  results,  match  or  fail,  match¬ 
ing  a  sequence  against  a  probabilistic  profile  produces  some  value  in  the  range  0  to  1. 

We  begin  by  considering  just  the  case  in  which  all  members  of  a  family  are  identical  ex¬ 
cept  for  substitutions  of  one  amino  acid  for  another.  Then  we  can  build  a  simple  hidden 
Markov  model  (or  HMM.as  described  in  Section  5.1 1 .2)  that  corresponds  to  the  profile  of 
a  family.  Such  a  hidden  Markov  model  is  called  a  profile  HMM  To  describe  a  family  of  se¬ 
quences  ol  length  n,  we  will  build  a  profile  HMM  M  =  (K  O  n  A  B)  where- 
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•  K  is  a  set  of  n  states,  one  for  each  position  in  the  sequence. 

•  O  is  the  output  alphabet,  namely  the  set  of  amino  acid  or  nucleotide  symbols. 

•  7r  contains  the  initial  stale  probabilities.  tt[1]  =  1  and  all  other  values  of  rr  are  0. 

•  A  contains  the  transition  probabilities.  Vi  <  n(A\i.i  +  1)  -  1 )  and  all  other  values 
of  A  are  0. 

•  B  contains  the  output  probabilities.  So.  for  any  symbol  a.  if  the  probability  that  the 
i"1  symbol  is  an  a  is  p .  then  B[i,  n)  =  p. 

So  the  state  structure  of  M  for  a  four-element  sequence  family  would  be  as  shown  in 
Figure  K.4(a).  M  will  start  in  state  1  and  then  move  to  2.  then  3.  then  4.  Associated  with 
each  state  is  a  vector  that  lists,  for  each  output  symbol,  the  probability  that  M  will  gen¬ 
erate  it  when  it  is  in  that  state.  Suppose,  for  example,  that  almost  always  the  first  ele¬ 
ment  of  a  sequence  in  this  family  is  a  W  but  rarely  it  may  be  a  K  instead. Then.  fl[l,  W] 
might  be  .95.  fl[l.  K]  might  he  .05.  and.  for  all  other  values  v .  B\  1,  v  |  would  be  0. 

Given  a  new  sequence,  the  probability  that  it  was  generated  by  M  (and  so  is  related 
to  the  sequences  that  were  used  to  define  M)  can  be  computed  using  the  forward  al¬ 
gorithm.  described  in  Section  5.1 1.2. 

But  substitutions  are  not  the  only  mutations  that  can  occur.  Elements  can  be  deleted 
from  a  sequence  and  new  ones  can  be  added.  So  we  need  to  expand  the  HMM  to  in¬ 
clude  stales  that  correspond  to  deletions  and  insertions.  We  can  do  that  by  building  a 
profile  HMM  with  the  structure  shown  in  Figure  K.4(b).The  match  states  at  the  bottom, 
represented  as  squares,  correspond  to  an  original  ancestral  sequence.  The  fact  that  new 
elements  could  have  been  added  to  that  sequence  is  encoded  in  a  set  of  insertion  states, 
shown  as  diamonds.  Each  time  the  machine  enters  an  insertion  state,  it  outputs  one  sym¬ 
bol  according  to  the  output  probabilities  associated  with  that  state.  Notice  that  there  is 
a  transition  from  each  insertion  state  back  to  itself  since  more  than  one  element  may  be 
inserted  at  any  point.  The  fact  that  elements  may  have  been  deleted  from  the  original 
sequence  is  encoded  in  a  set  of  deletion  states,  shown  as  circles.  All  the  output  probabil¬ 
ities  associated  with  deletion  stales  are  0.  So  if  the  machine  enters  a  deletion  state  in¬ 
stead  of  the  corresponding  match  stale,  one  symbol  from  the  original  sequence  will  fail 
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FIGURE  K.4  An 
HMM  that  describes 
a  protein  sequence 
family. 
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to  appear  in  the  output  sequence.  We’ve  shown  all  possible  transitions  among  the  three 
kinds  of  states.  The  probabilities  associated  with  those  transitions  (as  well  as  the  output 
probabilities  associated  with  the  match  and  insertion  states)  must  be  acquired  from  a 
training  set  of  sequences  that  the  model  is  designed  to  describe. 

Given  a  collection  of  profile  HMMs  of  this  sort  and  a  new  sequence  s,  we  can  use 
the  forward  algorithm  to  find  the  HMM  with  the  highest  likelihood  of  outputting  s. 

We  can  also  use  profile  HMMs  to  find  an  optimal  alignment  of  a  new  sequence  with 
a  known  family.  We  can  recast  this  problem  as  follows:  Given  a  profile  HMM  M  and  a 
sequence  s,  what  path  through  Af  has  the  highest  probability  of  outputting  s?  This 
problem  can  be  solved  with  the  Viterbi  algorithm,  as  described  in  Section  5.12.2. 

4  RNA  Sequence  Matching  and  Secondary  Structure 
Prediction  Using  the  Tools  of  Context-Free  Languages 

So  far.  we  have  considered  the  problem  of  aligning  and  matching  DNA  and  protein  se¬ 
quences.  We  have  been  able  to  define  useful  techniques  for  solving  those  problems  using 
the  tools  of  regular  languages. The  reason  this  was  possible  is  that  we  were  able  to  make 
the  assumption  that  whatever  mutations  occurred  and  caused  the  variations  among  the 
sequences  in  a  family  occurred  independently  of  each  other.  The  facts  of  primary  struc¬ 
ture  (sequence)  evolution  of  DNA  and  proteins  by  and  large  support  that  assumption. 

Bui  now  let's  consider  RNA.  Tire  secondary  (three-dimensional)  structure  of  an 
RNA  molecule  is  usually  critical  to  its  function.  Because  of  the  way  that  RNA  mole¬ 
cules  fold,  a  change  to  a  single  nucleotide  in  a  stem  (paired)  region  could  completely 
alter  the  molecule's  shape  and  thus  its  function.  So  it  turns  out  that,  for  many  RNA  mol¬ 
ecules,  secondary  structure  is  more  likely  to  be  conserved  than  primary  (sequence) 
structure  is.  If  secondary  structure  is  to  be  conserved,  then  any  change  to  one  nucleotide 
in  a  stem  must  be  matched  by  a  corresponding  change  to  the  paired  nucleotide. 

Let's  return  to  the  example  RNA  fragment  that  we  considered  in  Section  K.1.3: 

AACCCACUCUAAAUCUCGUGCCACUCC 

We  saw  that  this  fragment  is  likely  to  fold  to  form  the  structure  shown  in  Figure 
K.5(a).  which  contains  one  stem  that  is  six  base  pairs  long  (shown  with  the  lines  be¬ 
tween  the  paired  bases).  We  can  also  represent  the  paired  bases  in  the  stem  as  shown  in 
Figure  K.5(b). 

Suppose  that  there  were  a  mutation  and  the  U  in  position  (counting  from  the  left)  8 
were  replaced  by  C.Then.in  order  to  preserve  the  folding  structure,  the  A  in  position  23 
would  also  have  to  change  (to  G).  On  the  other  hand,  any  number  of  substitutions  could 
occur  in  the  loop  region  without  changing  the  secondary  structure  of  the  molecule.  It  is 
quite  common  to  lind  related  RNA  molecules  whose  sequences  are  very  different  but 
whose  secondary  structure  (and  thus  function)  have  been  conserved 

The  nested  dependencies  that  determine  secondary  structure  cannot  be  described 
by  a  regular  expression  or  recognized  by  a  finite  state  machine.  But  they  can  be  de¬ 
scribe  iy  a  context-  ree  grammar.  Notice  that  the  structure  is  very  similar  to  other 

neslc  sir uctures  (  or  example  matched  parentheses  and  palindromes)  that  we  have  al- 
r.*:ulv  described  with  CFr»<< 
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(b) 

FIGURE  K.5  An  RNA  molecule  folding. 

The  real  story  on  RNA  folding  is  more  complicated  than  what  we  have  just  de¬ 
scribed.  So,  before  we  attempt  to  write  a  grammar  that  describes  a  family  of  strings 
with  the  same  secondary  structure,  we  will  add  one  more  twist:  Complementary  pairs 
(C  -  C  and  A  -  U)  are  the  most  likely  to  form  base  pairs  that  build  stems.  But  other  pairs 
can  also  be  joined  to  form  base  pairs.  In  particular.  C  -  U,  although  less  likely  than  the 
complementary  pairs  to  bond,  can  do  so  not  infrequently.  So  what  we  need,  to  model 
this  phenomenon,  is  a  stochastic  context-free  grammar  of  the  sort  we  described  in 
Section  1 1.10.  coupled  with  an  appropriate  stochastic  parser. 

The  rules  for  a  fragment  of  a  stochastic  CFG  that  describes  RNA  sequences  with  a 
three  nucleotide  tail,  a  six  base  pair  stem,  and  a  seven  nucleotide  loop  would  then  be 
something  like  the  following  (with  the  probabilities  shown  in  brackets): 

<family>  —*  <iuil><sitmloop>  |1]  /*  <slemloop>  builds  a  six 

base  pair  stem  plus  a  loop. 

<tail>  — *  <  base  >  <  base  >  <  base  >  m 

<stcmlonp>  —*  C  <stemloop-5>  G  [.23]  !*  <siemloopS>  builds  a  five 

base  pair  stem  plus  a  loop. 

<stemloop>  —*■  C  <stemloop~5>  C  (.23) 

<stemloop>  —*  A  <xtemloop-5>  U  [.23] 

<stemlnop>  — *  U  <stemluop-5>  A  [.23] 

<stemloop>  — *  C  <stemlnop- 5>  U  [.03] 

<siemloop>  — *  U  <stemloop-5>  C  [.03] 

<stemloop- 5>  — ►  ... 

There  arc  two  kinds  of  questions  we  would  like  to  be  able  to  answer  with  a  grammar 
such  as  this: 

•  Given  a  new  sequence  s  and  a  grammar  G  that  describes  a  family  of  known  sequences, 
what  parse  tree  describes  the  most  likely  way  in  which  G  could  have  generated  s'?  That 
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parse  predicts  .v’s  secondary  structure.  To  answer  this  question  requires  a  stochastic 
parser  that  is  the  context-free  equivalent  of  the  Viterbi  algorithm. 

•  Given  a  new  sequence  v  and  a  grammar  G  that  describes  a  family  of  known  se¬ 
quences,  what  is  the  probability  that  G  generated  s  via  some  path?  The  answer  to 
this  question  allows  us  to  compare  families  and  find  the  one  to  which  s  is  most  likely 
to  belong?  To  answer  this  question  requires  the  context-free  equivalent  of  the  for¬ 
ward  algorithm  for  HMMs. 

As  with  all  probabilistic  models,  to  be  able  to  answer  any  of  these  questions  requires 
that  we  obtain,  typically  from  a  set  of  training  instances,  the  probabilities  associated 
with  the  rules  of  the  grammar. 

The  discussion  presented  here  has  barely  scratched  the  surface  of  this  problem.  For 
substantial  additional  detail,  see  [Durbin  el  al.  1998)  and  R. 


Complexity  of  the  Algorithms  Used  in  Computational 
Biology 

Obvious  approaches  to  many  of  the  problems  we  have  sketched  here  are  computation¬ 
ally  intractable.  For  example,  consider  the  problem  of  sequencing  a  large  DNA  or  pro¬ 
tein  molecule.  Since  it  is  difficult  to  do  that  directly,  the  standard  procedure  is  to  clone 
the  molecule  and  then  break  the  copies  randomly  into  smaller  molecules  that  can  be 
sequenced  individually.  Then  it  remains  to  figure  out  how  the  smaller  molecules  were 
connected  in  ihe  original  one. Thinking  of  the  molecules  as  strings,  our  goal  is  to  find  a 
single  string  that  contains  each  of  the  shorter  strings  as  a  substring.  Clearly  we  can  find 
one  such  string  simply  by  concatenating  together  all  of  the  shorter  strings.  But  we  as¬ 
sume  that  that  is  not  the  original  string.  Instead  we  assume  that  the  most  likely  original 
siring  is  the  shortest  siring  that  contains  each  of  the  observed  substrings.  Finding  the 
shortest  common  superstring  of  a  set  of  strings  is  NP-hard.  When  we  convert  that  prob¬ 
lem  to  a  decision  problem  we  gel  the  language: 

•  SMORI  ES1-SUPERS  rRlNG(<S.  k>  :  S  is  a  set  of  strings  and  there  exists  some 
superstring  T  such  that  every  element  of  S  is  a  substring  of  T  and  T  has  length  less 
than  or  equal  to  A:}. 

SHORTEST-SUPERSTRING  is  NP-complele. 

As  another  example,  consider  the  problem  of  predicting  how  proteins  will  fold.  Ob¬ 
vious  approaches  require  search.  In  some  cases,  it  is  possible  to  prove  that  no  more  ef¬ 
ficient  algorithm  exists.  For  example  | Berger  and  Leighton  1998]  shows  that  the 
protein-folding  problem,  given  the  hydrophobic-hydrophilic  model  of  protein  struc¬ 
ture,  is  N  P-complete. 
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Applications:  Natural  Language 
Processing 

Why  should  we  care  about  the  formal  properties  of  a  language  like  English? 

More  than  375  million  people  speak  English  as  their  first  language.  The 
number  would  be  even  larger  if  we  had  picked  Chinese  instead.  Why  do 
computers  need  to  be  involved? 

Millions  of  words  of  text  are  posted  on  the  Internet  every  day.  People  can  read  any  one 
page  of  that  easily,  but  none  of  us  can  sort  through  that  mountain  of  words  to  find  the  one 
page  we're  interested  in.  Computers  are  good  at  that.  If  programs  could  read  English  and 
Spanish  and  Chinese,  real  automated  query  agents  could  exist.To  build  such  programs,  we 
need  an  understanding  of  the  phenomenon:  What  are  the  sentences  of  English  (or  Spanish 
or  Chinese)  and  how  are  they  structured?  The  formal  theory  we  have  presented  is  just  a 
small  beginning  of  that  analysis,  but  it  is  a  beginning.  In  this  section,  we  summarize  a  few  of 
the  ways  in  which  our  theory  can  be  applied  in  natural  language  processing  (NLP)  systems 
a.  For  a  comprehensive  treatment  of  modem  NLP  techniques,  see  [Jurafsky  and  Martin 
2(KX)].  For  more  on  statistical  techniques  in  particular,  see  [Manning  and  Schiitze  1999]. 

L.1  Morphological  Analysis 

The  first  step  in  almost  any  approach  to  processing  natural  language  sentences  is  to 
find  the  words  and  look  them  up  in  a  dictionary.  For  English,  this  is  only  slightly  more 
difficult  than  it  sounds.  For  some  other  languages,  it  is  substantially  more  so.  Although 
we  will  limit  this  discussion  to  English,  we'll  still  he  able  to  see  most  of  the  major  issues. 

A  morpheme  is  the  smallest  linguistic  unit  to  which  meaning  can  be  assigned.  Consider 
words  like  animal .  destroy,  and  cacophony.  Each  of  these  words  is  composed  of  a  single 
morpheme  and  exists  in  any  reasonable  dictionary  of  English.  But  now  consider: 

•  1  oudl  y:  the  simple  word  1  oudl  y  is  composed  of  two  morphemes: 

•  loudly  +  ly  (adverb). 

•  1  i  kes:  unfortunately,  there  are  two  ways  to  analyze  the  simple  word  1  i  kes; 

•  like  (noun)  +  s  (plural). 

•  like  (verb)  +  s  (third  person  singular). 
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•  leave s:  there  may  he  three  or  more  different  morphological  analyses: 

•  leaf  (noun)  +  s  (plural). 

•  leave  (noun,  as  in  mi  1  i  tary  1  eave)  +  s  (plural). 

•  leave  (verb)  +  s  (third  person  singular). 

•  disparagingly:  this  single  word  can  be  decomposed  into  three  (and  sometimes 
even  more)  morphemes: 

•  disparag-  (verb)  +  ing  (progressive)  +  ly  (adverb). 

•  skies:  sometimes  the  stem  is  changed  when  it  combines  with  an  affix.  Here  the 
final  y  of  sky  is  rewritten  as  ie  before  an  s  is  added: 

•  sky  (noun)  +  e  +  s  (plural). 

•  toys:  but  whether  the  y  changes  depends  on  the  preceding  letter: 

•  toy  (noun)  s  (plural). 

•  went:  sometimes  neither  the  root  nor  the  standard  form  of  the  affix  is  anywhere  to 
he  found: 

•  go  (verb)  +  ed  (past). 

•  f  i  sh:  there  are  two  noun  analyses  (plus  a  verb  one)  because  the  plural  affix  is  ren¬ 
dered  as  the  empty  string: 

•  fish  (noun),  with  no  affixes,  so  the  default  “unmarked”  form,  singular,  is  meant. 

•  fish  (noun)  +  (plural). 

•  fish  (verb). 

•  women:  sometimes  (although  rarely  in  English)  the  root  word  changes  internally  in¬ 
stead  of  simply  adding  an  affix  before  or  after  it: 

•  woman  (noun)  +  (plural). 

•  unf  r i  endl  y:  affixes  can  be  added  to  both  the  front  and  the  end  of  a  root. 

•  un  (negative)  +  friend  +  ly  (adverb). 

Depending  on  the  dictionary,  some  or  all  of  these  words  may  be  found.  But  new  words 
come  into  the  language  on  a  regular  basis  and  the  process  by  which  new  words  are  formed 
by  adding  affixes  is  productive,  meaning  that  the  instant  the  word  bl  og  appears,  so  do  the 
words  blogger,  blogging,  blogged,  unblog  and  so  forth.  (Just  as  an  aside, as  this  para¬ 
graph  was  written,  none  of  those  words  were  in  the  dictionary  that  the  spell  checker  was 
using.)  So  it  makes  sense  to  encode  the  regular  rules  for  adding  affixes,  rather  than  to  re¬ 
quire  that  all  inflected  forms  be  entered  explicitly  in  the  dictionary.  Further,  there  are  lan¬ 
guages.  for  example  Finnish  and  Turkish,  in  which  there  may  be  hundreds  or  thousands  of 
forms  of  a  single  verb.  It  would  be  completely  impractical  to  list  them  all. 

II  our  task  is  to  read  and  interpret  English  text,  then  we  need  a  technique  for  analyzing 
words  in  their  surface  form  (the  form,  such  as  leaves  in  which  they  appear  in  text)  and 
translating  them  into  the  form  we  11  call  lexical  (in  which  a  word  is  represented  as  a  root 
form,  the  main  meaning  unit  listed  in  the  dictionary,  plus  one  or  more  affixes  that  modify 
the  main  meaning).  If,  on  the  other  hand,  we  are  building  an  application  that  must  gener¬ 
ate  English  sentences  or  text  (for  example  in  a  conversational  system,  or  a  text  summary 
system,  or  the  output  side  of  a  machine  translation  system),  then  we  need  a  way  to  map 
from  lexical  (meaning)  form  to  surface  form.  It  makes  sense,  from  a  software  engineering 
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perspective,  as  well  as  from  a  linguistic  one,  to  attempt  to  capture  the  facts  about  those 
mappings  in  a  single  representation  that  can  be  used  in  either  direction. 

We  can  do  this  with  a  bidirectional  finite  stale  transducer  of  the  sort  that  we  described 
in  Section  5. 10.  Such  a  system  is  called  a  two-level  morphological  analyzer  a.  For  a  good 
description  of  the  evolution  of  this  idea  in  linguistics,  see  [Karttunen  and  Beesley  2001]. 
We  show  here  a  very  tiny  fragment  of  a  morphological  transducer  for  English.  While  a 
practical  one  might  have  KXMXKi  or  more  transitions,  it  does  not  need  to  be  built  by  hand. 
It  can  be  compiled  from  a  dictionary  and  a  set  of  rules  that  describe  the  order  in  which  af¬ 
fixes  can  be  applied,  as  well  as  the  spelling  alternations  (like  converting  y  to  i )  that  Eng¬ 
lish  requires.  For  more  detail  on  how  to  make  this  work  in  practice,  see  [Jurafsky  and 
Martin  2000).  In  our  example,  we  use  the  notation  ulb.  just  as  we  did  in  Section  5.10.  In 
this  case,  alh  means  that  the  symbol  a  occurs  in  the  surface  form  and  the  symbol  b  occurs 
in  the  lexical  torm.When  the  symbol  is  the  same  on  both  levels,  we  will  write  it  simply  as 
a.  We  use  #  in  surface  forms  to  indicate  a  word  boundary.  We  use  the  following  symbols 
in  lexical  forms:  +N  (noun),  +V  (verb).  +ADJ  (adjective).  +SG  (singular).  +  PL  (plu¬ 
ral),  and  +3SG  (3rd  person  singular,  for  verbs). 


EXAMPLE  L.1  A  Tiny  Morphological  Transducer 
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This  simple  transducer  can.  for  example, perform  the  following  mappings  in  either 
direction: 

•  leaf#  «=>  leaf  +  N  +  SG 

•  leave#  <=*  leave  +  N  +  SG 

•  leave#  <=>  leave  +  V 

•  leaves#  «=>leaf  +  N  +  PL 

•  leaves#  «=>  leave  +  N  +  PL 

•  leaves#  cleave  +  V  +  ISO 

•  lean#  <=>lean  +  ADJ 

•  lean#  <=>  lean  +  V 


Note  that  when  the  transducer  that  we  just  built  is  run  in  the  surface  form  to  lexical 
form  direction  it  is  nondeterministic.  It  may  output  more  than  one  lexical  form  and 
those  forms  may  have  different  part  of  speech  tags  (e.g.,  N  or  V).  But,  of  course,  a 
straightforward  dictionary  lookup  can  also  report  more  than  one  possible  part  of 
speech  for  a  single  word.  In  the  next  section,  we  consider  a  technique  for  resolving 
that  ambiguity. 


Part  of  Speech  Tagging 

Consider  the  problem  of  parsing  an  English  sentence  such  as  Store  i  ce  i  n  the  cooler. 
We’d  like  to  do  that  using  a  grammar  of  the  sort  we  presented  in  Example  1 1 .6.The  rules  of 
that  grammar  are  wriiten  using  part  of  speech  (POS)  tags,  such  as  N(oun),  V(erb),  and 
Adj(eclive).  So,  before  those  rules  can  be  applied  to  a  sentence,  it  is  necessary  to  tag  each 
word  with  the  part  of  speech  that  corresponds  to  the  way  the  word  is  functioning  in  that 
sentence.  We  can  begin  by  looking  the  words  up  in  a  dictionary  or  by  using  a  morphologi¬ 
cal  analyzer  of  the  sort  we  just  described.  If  we  do  that  for  the  words  in  Store  ice  in  the 
cooler,  we'll  gel  the  values  shown  in  Table  L.1. 

We've  simplified  a  bit  here.  Practical  NLP  systems  use  tag  sets  with  somewhere  be¬ 
tween  45  and  2(H)  tags,  so  the  classes  that  we  have  labeled  N.  V.  and  Adj  would  be  fur¬ 
ther  subdivided,  thus  making  the  disambiguation  problem  even  more  difficult. 


ruble  L.l  Words  and  their  part  of  speech  taes.  1 

store 

N.V 

ice 

N.V 

cooler 

N.  Adj 
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To  build  a  part  of  speech  (POS)  tagger,  we  need  to  make  use  of  two  kinds  of 
information: 

•  Given  a  word,  what  tags  can  be  applied  and  how  likely  is  each  of  them?  For  exam¬ 
ple,  while  the  word  store  can  be  either  a  noun  or  a  verb,  it  is  more  likely  to  be  a 
noun  than  it  is  to  be  a  verb. 

•  Given  a  particular  sentential  context,  what  tag  is  most  likely  to  come  next?  For  ex¬ 
ample,  a  verb  rarely  comes  after  the  word  the. 

There  are  two  approaches  one  can  take  to  capturing  that  information  and  encoding 
it  in  an  effective  POS  tagging  system. 

•  Create  a  set  of  rules  that  describe  the  facts.  Match  the  rules  against  input  sentences 
to  build  a  tag  sequence. 

•  Build  a  hidden  Markov  model  (HMM).  Use  the  Viterbi  algorithm  that  we  discussed 
in  Section  5.1 1.2  to  find  the  path  that  is  most  likely  to  have  produced  the  observed 
sequence  of  words.  We  ll  briefly  sketch  that  approach  here. 

Consider  the  sentence  we  are  analyzing.  Imagine  that  whatever  process  generated 
it  actually  generated  a  sequence  of  parts  of  speech.  We  want  to  know  what  that  se¬ 
quence  is  but  we  cannot  observe  it  directly.  All  we  can  observe  is  the  sequence  of 
words  that  were  generated  from  those  parts  of  speech.  So  we  can  build  an  HMM  in 
which  the  (hidden)  states  correspond  to  parts  of  speech,  the  outputs  correspond  to 
the  words,  and  the  probabilities  describe  the  likelihood  of  one  part  of  speech  follow¬ 
ing  another  and  the  likelihood  that  a  particular  part  of  speech  will  be  realized  as  a 
particular  word. 

So  we  can  build  a  straightforward  HMM  for  POS  tagging  as  follows: 

•  Let  K  contain  one  state  for  each  part  of  speech  tag. 

•  Let  O  be  the  set  of  possible  words. 

•  Let  t r  describe  the  probabilities,  for  each  lag,  of  a  sentence  starling  with  that  tag. 

•  Let  A  describe  the  transition  probabilities,  i.e.,  the  probability,  given  some  tag 
that  the  next  tag  will  be  h. 

•  Let  B  describe  the  output  probabilities,  i.e.,  the  probability,  given  some  tag  r,  that 
the  word  that  corresponds  to  that  tag  is  word  u\ 

The  probabilities  in  n,  A,  and  B  have  to  come  from  somewhere.  Fortunately,  there 
are  large  datasets  o  (generally  called  corpora )  of  sentences  that  have  already  been 
tagged,  so  they  can  be  used  to  train  the  HMM. 

The  model  that  we  have  just  described  is  called  a  bigram  tagger.  It  decides  on  the 
tag  for  the  current  word  by  considering  just  a  single  preceding  word.  We  could  extend 
ihe  idea  and  create  stales  that  correspond  to  a  sequence  of  two  tags. Then,  in  A,  we’d 
capture  the  probability  that  tag  t2  follows  a  particular  sequence  of  two  prior  tags.  We'll 
call  such  a  model  a  trigram  tagger. 

While  the  overall  problem  of  NLP  is  far  from  solved,  the  POS  tagging  piece  of  the 
problem  is  very  nearly  so.  Current  POS  taggers  report  accuracies  in  the  range  of  97% 
(i.e.,  97%  of  the  words  are  tagged  correctly).  Some  taggers  can  be  set  to  return  not 
just  the  single  POS  sequence  (i.e.,  path  through  the  HMM)  with  the  highest  score 
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but  some  specified  number  of  highest  scoring  sequences,  lhose  paths  can  be  presented 
to  a  parser,  which  can  then  be  asked  to  select  the  one  that  is  consistent  with  the  rules  of 
the  grammar. 


The  Grammar  of  English 

In  this  section,  we  consider  the  problem  of  building  a  formal  model  for  the  syntax 
(grammar)  of  English.  We  will  attempt  to  answer  the  question,  “How  powerful  does 
such  a  model  have  to  be  in  order  to  describe  the  facts  about  grammatically  correct 
English  sentences?” 


.1  Is  English  Regular? 

If  the  set  of  grammatical  English  sentences  is  finite,  then  it  is  regular.  It  is  finite  if  there 
is  a  finite  number  of  words  and  a  longest  grammatical  sentence.  At  any  particular  point 
in  time  and  for  any  specific  dialect  of  English,  there  exists  the  longest  word  ever  used 
and  the  longest  sentence  ever  spoken  or  written.  But,  in  principle,  the  next  day  some¬ 
one  could  add  a  bit  to  that  longest  sentence  to  make  a  yet  longer  one.  Just  as  an  exam¬ 
ple  that  shows  that  there  are  sentences  that  are  much  longer  than  whatever  upper  limit 
you  probably  have  in  mind, consider  the  630  word  sentence  shown  in  Figure  L.l.  It  an¬ 
nounces  a  local  government’s  intention  to  move  a  path. 

In  Example  8.19,  wc  assumed  that  there  was  no  bound  on  the  length  of  English 
sentences  and  we  gave  one  proof  that  English  is  not  regular.  That  proof  was  based 
on  the  structure  of  sentences  such  as  The  rat  that  the  cat  saw  ran.  It  is  possi¬ 
ble  to  do  different  but  similar  proofs  based  on  other  naturally  recursive  structures 
in  English. 

For  example,  consider  the  following  argument  from  [Chomsky  1957],  based  on  a 
fragment  of  a  grammar  for  English  sentences: 

S— ►  if  S  then  S 
S  —*  either  S  or  S 

5  — *  the  man  who  said  Sis  arriving  today 

Any  grammatical  sentence  that  combines  the  constructs  defined  by  those  three 
rules  must  properly  nest  all  if/then  and  either/or  pairs.  For  example,  the  following 
string  is  grammatical: 

If  either  the  man  who  said  if  it  rains  then  we  can't  go  is 
arriving  today  or  the  man  who  said  if  it’s  sunny  then  we  must 
go  is  arriving  today  then  we  must  go. 

By  the  way,  while  it  is  very  unlikely  that  anyone  would  ever  write  such  a  sentence, 
and  if  they  did.  they  would  almost  certainly  use  commas,  keep  in  mind  that  English  was 
first  a  spoken  language,  in  which  pauses  and  inflection  make  it  possible  to  utter  sen- 
lences  like  this  one  and  have  them  be  understandable. 
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A  path  front  a  point  approximately  350  metres  cast  of  the  most  south  westerly  corner  of  17 
Bathcrton  Close.  Widnes  and  approximately  208  metres  cast-south-east  of  the  most  southerly 
corner  of  Unit  3  Foundry  Industrial  Estate.  Victoria  Street.  Widnes.  proceeding  in  a  generally 
east-north-easterly  direction  for  approximately  28  metres  to  a  point  approximately  202  metres 
east-south-east  of  the  most  south-easterly  corner  of  Unit  4  Foundry  Industrial  Estate.  Victoria 
Street,  and  approximately  347  metres  cast  of  the  most  south-easterly  corner  of  17  Batherton 
Close,  then  proceeding  in  a  generally  northerly  direction  for  approximately  21  metres  to  a  point 
approximately  210  metres  east  of  the  most  south-easterly  corner  of  Unit  5  Foundry  Industrial 
Estate.  Victoria  Street,  and  approximately  202  metres  east-south-east  or  the  must  north-easterly 
corner  of  Unit  4  Foundry  Industrial  Estate.  Victoria  Street,  then  proceeding  in  a  generally  east- 
north-east  direction  for  approximately  64  metres  to  a  point  approximately  282  metres  east- 
south-easi  of  the  most  easterly  corner  of  Unit  2  Foundry  Industrial  Estate.  Victoria  Street, 
Widnes  and  approximately  259  metres  east  of  the  most  southerly  corner  of  Unit  4  Foundry  In¬ 
dustrial  Estate,  Victoria  Street,  then  proceeding  in  a  generally  east-north-east  direction  for  ap¬ 
proximately  350  metres  to  a  point  approximately  3  metres  west-north-west  of  the  most  north 
westerly  corner  of  the  boundary  fence  of  the  scrap  metal  yard  on  the  south  side  of  Cornubia 
Road.  Widnes.  and  approximately  47  metres  west-south-west  of  the  stub  end  of  Cornubia  Road 
be  diverted  to  a  3  metre  wide  path  from  a  point  approximately  183  metres  east-south-east  of  the 
most  easterly  corner  of  Unit  5  Foundry  Industrial  Estate.  Victoria  Street  and  approximately  272 
metres  east  of  the  most  north-easterly  corner  of  26  Ann  Street  West.  Widnes,  then  proceeding  in 
a  generally  north  easterly  direction  for  approximately  58  metres  to  a  point  approximately  216 
metres  cast-south-east  of  the  most  easterly  corner  of  Unit  4  Foundry  Industrial  Estate,  Victoria 
Street  and  approximately  221  metres  east  of  the  most  southerly  corner  of  Unit  5  Foundry  Indus¬ 
trial  Estate,  Victoria  Street,  then  proceeding  in  a  generally  easterly  direction  for  approximately  45 
metres  to  a  point  approximately  265  metres  east-south-east  of  the  most  north-easterly  comer  of 
Unit  3  Foundry  Industrial  Estate.  Victoria  Street  and  approximately  265  metres  east  of  the  most 
southerly  corner  of  Unit  5  Foundry  Industrial  Estate.  Victoria  Street,  then  proceeding  in  a  gener¬ 
ally  cast-south-east  direction  for  approximately  102  metres  to  a  point  approximately  366  metres 
east-south-east  of  the  most  easterly  corner  of  Unit  3  Foundry  Industrial  Estate.  Victoria  Street 
and  approximately  463  metres  cast  of  the  most  north  easterly  corner  of  22  Ann  Street  West, 
Widnes.  then  proceeding  in  a  generally  north-north-easterly  direction  for  approximately  19  me¬ 
tres  to  a  point  approximately  368  metres  cast-south-east  of  the  most  easterly  corner  of  Unit  3 
Foundry  Industrial  Estate.  Victoria  Street  and  approximately  512  metres  east  of  the  most  south 
easterly  corner  of  17  Bathcrton  Close.  Widnes  then  proceeding  in  a  generally  east-south,  easter¬ 
ly  direction  for  approximately  16  metres  to  a  point  approximately  420  metres  east-south-east  of 
the  most  southerly  corner  of  Unit  2  Foundry  Industrial  Estate.  Victoria  Street  and  approximate¬ 
ly  533  metres  east  of  the  most  south-easterly  corner  of  17  Batherton  Close,  then  proceeding  in  a 
generally  east-north-easterly  direction  for  approximately  240  metres  to  a  point  approximately 
606  metres  east  of  the  most  northerly  corner  of  Unit  4  Foundry  Industrial  Estate.  Victoria  Street 
and  approximately  23  metres  south  of  the  most  south  westerly  corner  of  the  boundary  fencing 
of  the  scrap  metal  yard  on  the  south  side  of  Cornubia  Road.  Widnes.  then  proceeding  in  a  gen¬ 
erally  northern  direction  for  approximately  44  metres  to  a  point  approximately  3  metres  west- 
north-west  of  the  most  north  westerly  corner  of  the  boundary  fence  of  the  scrap  metal  yard  on 
the  south  side  of  Cornubia  Road  and  approximately  47  metres  west-south-west  of  the  stub  end 
of  Cornubia  Road.  Q 


FIGURE  L.I  A  very  long  English  sentence. 
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So,  continuing  with  the  example:  If  English  were  regular,  then  we  could  apply  the 
following  substitution  to  English  sentences  and  the  resulting  language  would  also  be 
regular: 

•  Replace  every  instance  of  if,  either  and  the  man  who  said  by(. 

•  Replace  every  instance  of  then,  or  and  is  arriving  today  by). 

To  help  make  this  example  clearer,  let’s  also  substitute  c  for  each  instance  of  a  sen¬ 
tence  that  contains  no  embedded  sentences.  With  these  substitutions,  the  sentence  we 
just  wrote  would  be  rewritten  as: 

<(((c)c))((c)c))c. 

If  English  were  regular,  the  substituted  language  that  we  just  defined  would  also  be 
regular  since  the  regular  languages  are  closed  under  letter  substitution.  But  we  can 
show  that  the  substituted  language  is  not  regular  by  a  Pumping  Theorem  proof  that  is 
almost  identical  to  the  one  we  used  in  Example  8.10  to  show  that  the  language  of  bal¬ 
anced  parentheses  is  not  regular.  So  English  is  not  regular  either. 

Even  if  we  could  impose  some  upper  limit  on  the  length  of  English  words  and  sen¬ 
tences,  and  thus  be  able  to  argue  that  English  is  regular  because  it  is  finite,  describing 
English  as  a  regular  language  isn’t  very  useful.  For  many  applications,  we  would  like  to 
be  able  to  parse  English  sentences  into  syntactic  structures  that  correspond  to  the  sen¬ 
tences’  meanings.  For  example,  consider  the  small  fragment  of  an  English  grammar 
that  we  gave  in  Example  11.6: 

S-+NP  VP 

NP-*  the  Nominal]*  Nominal | Nominal\ProperNoun \NP  PP 

Nominal  —*  N  |  Adjs  N 

N  — » cat|dogs|bear|gi  rl|chocolate|rifle 

Pro  per  Noun  -*  Ch  ri  s  |  FI  uf  f  y 

Adjs  —*  Adj  Adjs\Ad j 

Adj  — *  younglol  der  |  smart 

VP-*V\V  NP\VP  PP 

V  — < ► 1 i ke 1 1 i kes | thi nks | shot | smel 1 s 

PP  Prep  NP 

Prep-*  with 


Given  the  sentence, “The  smart  older  cat  smells  chocol ate”,  this  grammar 
could  be  used  by  a  parser  to  produce  the  parse  tree  shown  in  Figure  L.2.That  tree  cor¬ 
responds  to  a  natural  way  of  assigning  meaning  to  the  sentence. There  is  one  object,  the 
cat,  about  whom  we  know  something,  namely  that  she  smells  chocolate 

It  is  possible  to  build  a  regular  grammar  to  describe  this  tiny  subset  of  English.  But 
every  rule  in  that  grammar  must  be  of  one  of  the  following  forms:  A  -» aB,  A  -  a,  or 
A  -»  a.  So  any  P^se  rice  generated  by  such  a  grammar  must  be  shaped  like  the  one 

^rd'“r  rrespon‘1  to  s“,y  — M 
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S 


The  smart  older  cat  smells  chocolate 


FIGURE  L.2  The  structure  of  a  parse  tree  corresponds  to  the  meaning  of  a 
sentence. 

Because,  as  in  the  example  that  we  just  considered,  there  may  be  a  difference  be¬ 
tween  being  able  to  construct  a  grammar  that  generates  the  strings  in  a  language  and 
being  able  to  construct  a  grammar  that  creates  useful  parse  trees  for  its  strings,  we  will 
divide  the  question.  "Does  there  exist  a  grammar  for  language  LT  into  two  parts.  Does 
there  exist  a  grammar  with: 

•  the  necessary  weak  generative  capacity .  which  we  define  to  be  the  ability  to  gener¬ 
ate  all  and  only  the  strings  in  L,  and 

•  the  necessary  strong  generative  capacity,  which  we  define  to  be  the  ability  not  only 
to  generate  all  and  only  the  strings  in  L  but  also  to  generate,  for  each  of  them,  at 
least  one  meaningful  parse  tree. 

Summarizing  our  discussion  so  far: 

•  If  English  is  finite,  then: 

•  Regular  grammars  have  the  weak  generative  capacity  to  describe  it. 

•  Regular  grammars  to  not  have  the  strong  generative  capacity  to  describe  it. 


FIGURE  1-3  A  parse 
tree  that  loses  semantically 
meaningful  structure. 


smells 


chocolate 
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•  If  English  is  not  finite,  then: 

•  Regular  grammars  have  neither  the  weak  nor  the  strong  generative  capacity  to 
describe  it. 

While  regular  grammars  do  not  do  a  good  job  of  capturing  the  complete  structure  of 
English  sentences,  it  turns  out  that  finite  state  machines  (FSMs)  can  be  used  effectively 
in  applications  where  a  complete  analysis  of  each  sentence  is  not  required.  Suppose,  for 
example,  that  we  want  to  scan  large  text  databases  and  look  for  sentences  that  contain 
patterns  that  involve  objects  (as  described  by  noun  phrases)  that  are  related  by  particu¬ 
lar  verbs  of  interest.  We  might,  for  example,  be  looking  for  articles  that  talk  about  cor¬ 
porate  takeovers  or  articles  that  talk  about  elections  in  South  America.  In  these  kinds 
of  problem  domains,  systems  based  on  the  idea  of  cascaded  FSMs  Q,in  which  one  FSM 
runs,  creates  output,  and  then  passes  that  output  onto  the  next  FSM,  have  been  shown 
to  be  useful. 

At  this  point,  we  can  summarize  the  bottom  line:  While  finite  state  machines  are 
useful  for  some  aspects  of  English  processing,  for  example  morphological  analyzers 
such  as  the  ones  we  described  in  section  L.l.  as  well  as  applications  where  an  incom¬ 
plete  analysis  is  adequate  for  the  task,  they  are  not  enough  to  solve  the  entire  problem. 

.2  Can  English  Be  Described  with  a  Markov  Model? 

Although  finite  state  machines  are  not  powerful  enough  to  describe  all  of  the  rules  that 
distinguish  grammatical  English  sentences  from  ungrammatical  ones,  Markov  models 
(FSMs  augmented  with  probabilistic  information)  can  do  a  very  good  job  of  generating 
English  text  that  appears  almost  natural  H.This  idea  was  suggested  in  [Shannon  1948). 
We  can  build  letter-level  models,  which  consider  the  previous  k  letters  as  a  basis  for 
generating  the  next  one.  Or  we  can  build  word-level  models,  which  predict  each  word 
based  on  the  previous  k  words.  A  model  that  uses  k  prior  outputs  (letters  or  words)  is 
called  a  k'h  order  model.  A  first-order  model  is  sometimes  called  a  bigram  model 
(since  it  is  based  on  the  probabilities  associated  with  pairs  of  outputs).  A  second-order 
model  is  trigram  model,  and  so  forth. 

In  one  of  his  Programming  Pearls  columns  [Bentley  2000],  Jon  Bentley  describes 
his  experiment  (down  to  the  level  of  providing  code)  in  building  Markov  models  of 
English  Q.  Let  s  first  consider  predicting  letters.  Using  probabilities  acquired  from 
990  Kb  of  the  text  of  this  book,  Bentley’s  Markov  model  produced  the  following 
example  strings: 

•  (Ir  =  1):  a  a  idjume  Thicha  lanbede  f  nghecom  isonys  rar  t  r  ores  aty  Ela  ancuny, 
ithi.  witheis  weche 

•  (k  -  2).  Ther  to  for  an  th  she  con  simach  a  so  a  impty  dough  par  we  forate  for  len 
poslrit  cal  nowillopecide  allexis  interne  numbectionsityFSM  Cons  onste  on  codere 
elexpre  ther 

•  [k  -  3).  Ouput  that  the  collowing  with  to  that  we'll  in  which  of  that  is  returesult  is 
alway  ther  is  id,  the  cal  on  the  Prove  be  and  N. 
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•  (A  =  4):  Notice  out  at  least  to  steps  if  new  Turing  derived  for  explored.  What  this  to 
check  solved  each  equal  string  it  matrix  (i.  k.y  must  be  put  part  can  may  generated 
grammar  in  D. 

•  (k  =  5):  So  slates,  and  Marling  rules  of  strings.  We  may  have  been  regions  to  see.  a 
list.  If  ?  ?  unrestricted  grammars  exist  a  devices  are  constructive-state  i  back  to 
computation 

•  (k  =  6):  We’ll  have  letter  substituted  languages  that  L(G)  since  we  drop  the  ad¬ 
dress  to  the  rulel  were  counterexample,  that  is  that  we  are  true  when  we  switched 
in  how  we 

•  (k  =  7):  If  it  docs  a  context-free  language  3.  If  the  model  of  which  corresponding 
b’s.  M  must  have  chosen  strings  as  a  tree  sueli  characters  of  some  p. 

While  this  may  seem  like  a  silly  exercise,  note  that,  by  k  -  3,  very  word-like  strings 
appear.  This  observation  served  as  the  basis  for  Typo  | Morris  and  Cherry  1975],  an  ef¬ 
fective  early  spell  checking  system. Typo  had  to  fit  into  a  very  small  amount  of  memory, 
so  it  was  impossible  to  store  a  dictionary.  Instead,  it  computed  trigram  frequencies  and 
used  them  to  find  letter  sequences  that  were  very  unlikely  to  occur  in  correctly  spelled 
English  words. 

Now  let’s  consider  building  word-level  Markov  models.  Again  using  probabilities 
acquired  from  990  Kb  of  the  text  of  this  book.  Bentley's  Markov  model  produced  the 
following  example  strings: 

•  ( k  =  1):  there  exists  at  the  idea  was  presented  for  some  finite  state  3  together.  So 
U  begins  in  this  approach,  it  is  labeled  with  wj  as  some  model  to  position-list, 
solve-15  can  reduce  every  derivation  becomes  Ml  and  the  number  of  A  building 
efficient  algorithms. 

•  (k  -  2):  The  language  to  be  if  the  various  grammar  formalisms  in  which  they  were 
deposited  make  no  moves  are  possible.  The  competition  can  come  from  some¬ 
where.  Fortunately,  there  arc  edges  from  level  nodes  to  level  nodes.  Now  suppose 
that  we  do  with  a  successful  value. 

•  ( k  =  3):  Compare  A  and  w.  If  equal,  accept  (i.e..  go  to  n).  Nondetcrministic  Semi- 
deciding  Next  we  must  decide  what  ii  means  for  a  nondetcrministic 'luring  machine 
M  that  decides  it.  From  M.  we  can  build  a  Turing  machine  because,  with  grammars, 
we  do  not  have  to  worry  about  a  dead  stale. 

•  (A  =  4):  Again,  let  st  be  the  working  string  at  any  point  in  its  computation  it  will 
have  executed  only  a  finite  number  of  squares  can  he  nonblank.  And.  even  if  M 
never  halts,  at  any  point  in  its  computation  it  will  have  executed  only  a  finite  num¬ 
ber  of  choices  at  each  derivation  step  and  since  each  path  that  is  generated  must 
eventually  end.  the  Turing  machine  M  that  computes  it. 

•  (k  -  5):  Is  there  any  computational  solution  to  the  problem?  •  If  there  is,  can  it 
be  implemented  using  some  fixed  amount  of  memory?  •  If  there  is  no  such  ele¬ 
ment.  then  choose  will:  •  Halt  and  return  False  if  all  the  actions  halt  and  return 
False.  •  Fail  to  halt  if  there  is  no  mechanism  lor  determining  that  no  elements  of  S 
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that  satisfy  P  exist.  This  may  happen  either  because  v  and  y  are  both  nonempty 
and  they  both  occur  in  region  n. 

•  (A  =  6):  Consider  a  programming  language  that  requires  that  variables  be  declared 
before  they  are  used.  If  we  consider  just  a  single  variable  w,  then  a  program  that  de¬ 
clares  w  and  then  uses  it  has  a  structure  very  similar  to  the  strings  in  the  language  L 
are  formed. The  set  of  all  possible  strings  over  an  alphabet  2  is  written  £*.  This  no¬ 
tation  exploits  the  Kleene  star  operator,  which  we  will  define  more  generally  below. 

Notice  that,  by  A  =  4.  most  of  what  is  generated  is  literal  text  from  the  training  cor¬ 
pus.  At  this  point  most  of  the  n-gram  probabilities  are  0.  With  a  larger,  more  heteroge¬ 
neous  training  corpus,  this  phenomenon  would  be  less  marked. 

|Copeland  and  Hacmer  2001]  describes  one  concrete  use  for  randomly  generated  text: 
testing  text-processing  software.  Random  text  will  push  the  software  into  obscure  corners 
that  the  designers  may  not  have  considered  but  that  may  show  up  eventually  in  real  text. 
Another  clever  use  of  randomly  generated  text  exploits  the  fact  that  Markov 
models  can  generate  text  that  seems  ‘'natural”  except  to  people.  Consider  the  prob¬ 
lem  of  generating  spam  that  will  make  it  through  a  spam  filter.  We’ll  call  the  spam 
message  that  we  want  to  send  S.  We  can't  simply  send  S  as  text.  If  we  do  that,  filters 
can  be  trained  to  recognize  and  filter  it.  But,  at  least  with  current  technology,  spam 
filters  can’t  read  images.  So  wc  can  hide  S  in  an  image.  But  spam  fillers  can  be  tuned 
to  reject  messages  that  contain  nothing  but  images.  They  can  also  be  tuned  to  recog¬ 
nize  and  reject  text  that  is  completely  random.  But.  using  a  Markov  model  of  Eng¬ 
lish,  it  is  possible  to  produce,  for  each  copy  of  5  that  we  want  to  send,  a  new 
paragraph  of  text  that  can  pass  any  statistical  test  of  Englishness.This  technique  can 
produce  a  message  like.  “There  was  something  gipsy-like  and  agreeable  in  the  din¬ 
ner,  after  confident  in  the  character  and  behaviour  of  the  girl  who  never  was  then,  of 
not  having  been  to  sleep  at  all.  and  by  the  uncommon  Well,  I  dont  know,  replied 
Sleerforth,  coolly.  You  may  as  well  This  was  formerly  the  castle  of  the  redoubted 
giant  Despair, That  affair  of  the  first  bond  for  four  thousand  five  hundred  grammar: 
so  that,  for  a  brother  and  sister,  we  made  a  most  uneven  pair.” 


.3  Is  English  Context-Free? 


If  regular  grammars  are  not  powerful  enough  for  English  and  probabilistic  FSMs  only 
approximate  English,  what  about  context-free  grammars?  Suppose  that  we  want  to 
build  a  computational  grammar  of  English  that  could  be  used  as  part  of  any  of  a  wide 
class  of  applications  including  spoken  English  interfaces,  machine  translation  systems, 
and  text-based  information  retrieval  engines.  Is  it  possible  to  build  a  context-free 
grammar  for  English  that  possesses  the  following  three  properties: 


1.  weak  generative  capacity:  In  other  words,  is  English  formally  context-free? 

2.  strong  generative  capacity:  In  other  words,  does  there  exist  a  context-free  gram- 

mai  t  at  generates  parse  trees  that  can  reasonably  be  used  to  derive  meanings 
from  English  sentences? 
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3.  good  engineering:  In  other  words,  is  the  grammar  modular?  Does  it  capture  ap¬ 
propriate  generalizations  so  that  important  structural  concepts,  such  as  Noun 
Phrase,  can  be  described  only  once?  Is  it  easy  to  build  and  maintain? 

We'd  like  the  answers  to  these  questions  to  be  ves.  As  we’ve  seen,  there  exist 
straightforward  and  relatively  efficient  parsers  for  context-free  grammars.  Since  such 
parsers  do  not  exist  for  unrestricted  grammars,  the  question  ol  whether  English  is  ef¬ 
fectively  context-free  is  important. 

In  a  nutshell,  the  answers  to  questions  1  through  3  above  appear  to  be  yes.  mostly, 
and  no,  respectively.  As  a  result,  most  computational  grammars  of  English  start  with  a 
context-free  core  and  then  add  exactly  enough  additional  mechanism  to  handle  the 
phenomena  that  cannot  easily  he  captured  in  a  pure  context-free  system. 


Is  English  Formally  Context-Free? 

We  begin  by  attempting  to  answer  the  first  question.  “Is  English  formally  context- 
free?"  Much  of  English  clearly  can  be  described  by  a  (very  greatly  enhanced)  version 
of  the  simple  context-free  grammar  that  we  gave  in  Example  1 1.6.  We  require  enhance¬ 
ments  to  deal  with  a  variety  of  constructs  that  are  missing  from  our  simple  grammar. 
For  example.  English  allows  several  kinds  of  embedded  structures,  including  the  ones 
(if/then,  either/or.  and  the  man  who  said  S)  lhat  we  used  in  our  argument  that  English 
wasn’t  regular.  Those  structures  are  easily  described  by  a  context-free  grammar.  The 
three  rules  we  gave  for  those  structures  when  we  introduced  them  are  context-free.  Al¬ 
though  the  last  of  them  is  clearly  a  special  case  rule  lhat  we  wouldn't  want  in  a  real 
grammar,  we’ll  keep  it  here  for  convenience.  So  we  could  generate,  for  our  sentence 
about  the  man  and  the  rain,  the  parse  tree  shown  in  Figure  L.4.  In  drawing  it,  we  use 
the  convention  lhat  a  triangle  describes  a  subtree  with  whose  internal  structure  we  are 
not  currently  concerned. 


s 


it  rains  we  can’t  go 


it  s  sunny  we  must  go 


go 


KICURK  L.4  A  parse  tree  for  a  deeply  embedded  sentence 
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But  English  also  appears  to  allow  structures  that  are  not  properly  nested.  It  is  those 
structures  that  have  formed  the  basis  for  some  arguments  that  English  is  not  formally 
context-free.  For  example, consider  the  respectively  construct: 

Chris  and  the  qirls  runs  and  swim  respectively. 

t  t 

The  relationships  between  the  nouns  and  the  verbs  in  this  sentence  are  shown  with 
arrows.  Instead  of  being  nested,  the  lines  cross.  Patterns  such  as  this  are  called  cross¬ 
serial  dependencies ,  and  they  cannot  naturally  be  described  with  context-free  rules. 
Recall  that  we  observed  cross  serial  dependencies  in  two  noncontext-free  languages 
that  we  considered  in  Chapter  13:  WcW  and  WW. 

Suppose  that  there  is  no  bound  on  the  number  of  nouns  and  verbs  that  can  occur  in 
a  respectively  construction  in  English,  and  further  suppose  that  there  is  some  rela¬ 
tionship  that  must  hold  between  each  noun  and  its  corresponding  verb.  In  particular,  in 
English,  present  tense  verbs  must  agree  with  their  subjects  in  number.  So  the  first  of 
the  following  sentences  would  be  grammatical.  The  second  wouldn’t  (which  we  indi¬ 
cate  by  preceding  it  with  *): 

Chris  and  the  girls  runs  and  swim  respectively. 

*  Chris  and  the  girls  runs  and  swims  respectively. 

With  these  assumptions,  it  appears  that  English  is  not  context-free.  We  can  prove  that 
claim  as  follows:  If  English  is  context-free,  then  any  subset  that  is  formed  by  intersecting 
English  with  a  regular  language  must  also  be  context  free  (by  Theorem  13.7).  Let: 

L  =  {English  sentences}  H 

{Chris  (and  (Chris  U  the  gi  rls))*  runs  (and  (run  U  runs))*  respectively}. 

For  any  sentence  tv  in  L,  let  s  be  the  string  of  noun  phrases  and  conjunctions  at  the 
beginning  of  w  and  let  svcrb  be  s  with  each  noun  phrase  replaced  by  its  matching  verb. 
Then  the  entire  sentence  is  of  the  form: 

ssverh  respectively. 

L  is  the  set  of  all  strings  of  that  form.  We  can  show  that  L  is  not  context-free  by  using 
the  context-free  Pumping  Theorem  in  a  proof  analogous  to  the  one  we  used  in 
Example  13.6  to  show  that  WW  =  {ivw  :«ie{a,  b}* }  is  not  context-free. 

The  problem  with  this  argument,  and  many  like  it,  has  been  pointed  out  in  [Pullam 
and  Ga/.dar  1982]  and  jPullum  1984].  Although  the  mathematics  are  correct,  the  em¬ 
pirical  observations  about  English  are  not.  In  principle,  each  noun  in  one  of  our 
respectively  sentences  should  agree  with  its  corresponding  verb.  But,  when  native 
speakers  are  asked  which  of  the  following  sentences  is  more  grammatical,  they  choose 
the  second  rather  than  the  first: 

Dan  and  Pat  runs  and  swims,  respectively. 

Dan  and  Pat  run  and  swim,  respectively. 
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So  far.  no  convincing  arguments  that  English  is  not  context-tree  have  been  discov¬ 
ered.  However,  there  are  arguments  that  some  other  languages  are  not.  One  of  those 
arguments  was  presented  in  [Shieber  19X5]  and  in  (Huybregts  19S4],  It  concerns 
Swiss  German,  which  includes  sentences  like  the  following,  which  do  contain  cross 
serial  dependencies: 

Dan  sait  das  mer  em  Hans  es  huus  halfed  aastriiche. 

"T"  ~T~  -  -  J  t 


Jan  says  that  we  Hnns/DAT  the  house/ACC  helped  paint. 

In  English,  we'd  indicate  the  direct  and  indirect  objects  with  word  order.  So,  in  this 
case,  we'd  say. “Jan  says  that  we  helped  Hans  paint  the  house."  In  Swiss  German,  on  the 
other  hand,  word  order  is  more  flexible:  instead  the  nouns  themselves  are  marked  for 
their  syntactic  function  in  the  sentence.  The  subject  of  a  sentence  will  end  with  a  nom¬ 
inative  marker:  the  direct  object  will  end  with  an  accusative  (ACC)  marker:  and  an  in¬ 
direct  object  will  end  with  a  dative  (DAT)  marker.  The  verb  “help"  requires  a  dative 
object  and  the  verb  “paint"  requires  an  accusative  one.  So  there  is  only  one  interpreta¬ 
tion  of  this  sentence,  the  one  shown  with  the  arrows,  in  which  Hans  got  helped  and  the 
house  got  painted. 

Of  the  three  questions  that  we  asked  a  couple  of  pages  ago.  we  have  now  answered 
the  first:  Context-free  grammars  appear  to  have  the  weak  generative  capacity  to  de¬ 
scribe  English  but  not  all  other  languages. 

Can  Context-Free  Grammars  Generate  Good  Parse  Trees  for  English? 

We  now  move  on  to  the  second  question:  Do  context-free  grammars  have  the  strong 
generative  capacity  to  describe  English?  As  we  said  above,  the  answer  to  this  question 
is. “mostly".  We've  already  seen  a  few  examples  of  reasonable  trees  that  can  be  built  by 
the  simple  context-free  rules  that  we've  so  far  considered. 

But  there  are  structures  that  are  hard  to  capture  with  context-free  rules.  Consider 
the  following  sentence: 

What  did  Will  say  he  ate  for  lunch? 

The  problem  in  building  a  parse  tree  for  this  Sentence  is  that  it  contains  a  gap.  The 
object  of  the  verb  ate  is  the  word  what,  which  comes  all  the  way  at  the  other  end  of  the 
sentence.  So  the  structure  that  we  wuuld  like  to  build  corresponds  to: 

What  did  Will  say  he  ate  _  for  lunch? 


Describing  structures  such  as  this  with  a  natural  context-free  grammar  is  difficult 
A  second  issue,  which  we  considered  briefly  in  Example  11.22.  is  that  many  Eng¬ 
lish  sentences  are  syntactically  ambiguous.  So  any  reasonable  context-free  grammar 
of  English  may  be  able  to  generate  not  just  the  correct  parse  tree  (i.e..  the  one  that 
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corresponds  to  the  intended  meaning  of  the  sentence)  but  also  one  or  more  addition¬ 
al  parse  trees,  each  of  which  corresponds  to  some  other,  unintended  meaning.  We’ll 
return  to  this  issue  in  the  next  section. 


Is  Using  A  Pure  Context-Free  Grammar  Good  Engineering? 

Let’s  now  consider  our  third  question:  Can  we  build  a  context-free  grammar  for  Eng¬ 
lish  that  is  also  a  sound  engineering  artifact?  For  example,  how  good  is  the  small  gram¬ 
mar  that  we  have  been  considering?  We  begin  by  observing  that  our  grammar 
generates,  among  other  things,  the  following  sentences: 

Chris  likes  the  cat. 

*  The  dogs  likes  the  cat. 

The  first  of  these  is  English.  The  second  is  not  because  its  subject  fails  to  agree  with 
its  verb  in  number.  So  we  have  found  an  error  in  our  grammar:  It  overgenerates  be¬ 
cause  it  fails  to  account  for  the  constraint  that,  in  English  present  tense  sentences,  the 
subject  and  the  verb  must  agree. The  problem  arises  from  the  first  rule  in  the  grammar: 

S-*NP  VP 


Once  an  S  has  been  divided  into  the  two  constituents  NP  and  VP ,  the  subsequent 
derivations  of  the  two  subtrees  proceed  independently. 

We  could  fix  this  problem  by  replacing  our  single  S  rule  by  a  pair  of  rules: 

S—*SNPSVP  I*  S  is  a  single  NP  followed  by  a  single  VP. 

S  — *  PN P  PVP  I*  S  is  a  plural  NP  followed  by  a  plural  VP. 

Bui  then  we'd  also  have  to  create  two  copies  of  the  rules  for  forming  NPs  and  two 
copies  of  the  rules  for  forming  VPs,  one  for  singular  phrases  and  one  for  plural  phrases. 
This  is  theoretically  possible,  of  course,  but  it  is  very  bad  engineering. 

The  problem  is  made  worse  by  the  fact  that  there  are  other  similar  phenomena. 
Consider  the  following  sentences: 

The  girl  likes  herself. 

*  The  girl  likes  himself. 


We  could  solve  this  problem  by  again  splitting  the  grammar  so  that  we  have  the  top- 
level  rules: 


S-*  MNP  MVP 
S  —*■  FNP  FVP 


I*  S  is  a  masculine  NP  followed  by  a  VP  whose  object, 
if  reflexive,  is  masculine. 

I*  S  is  a  feminine  NP  followed  by  a  VP  whose  object, 
if  reflexive,  is  feminine. 


But  now  we  have  four  kinds  of  NPs  and  four  kinds  of  V'Ps.This  is  even  worse  engineering. 

Unfortunately,  these  aren’t  the  only  phenomena  of  this  sort.  Consider  the  following 
pairs  of  sentences: 

This  cat  likes  chocolate. 

*  These  cat  likes  chocolate. 
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The  cat  likes  chocolate. 

*  The  cat  sleeps  chocolate. 

The  problem  in  the  lirst  pair  of  sentences  is  that,  while  most  English  modifiers  are 
not  marked  for  number,  a  few.  including  the  demonstratives  thi  s  and  these  are.  We 
can  solve  this  problem  similarly  to  the  way  we  solved  the  problem  of  subject/verb  agree¬ 
ment.  The  problem  in  the  second  pair  of  sentences  is  that  verbs  have  properties  that  de¬ 
termine  the  arguments  that  can  be  provided  to  them. Transitive  verbs,  including  likes, 
may  take  a  direct  object  (such  as  chocolate).  But  intransitive  verbs,  including  sleeps, 
may  not  have  u  direct  object.  Again,  we  can  solve  this  problem  by  creating  new  classes 
of  verbs  and  replicating  all  the  parts  of  the  grammar  that  describe  what  verbs  can  da 
But  the  size  of  the  grammar  that  well  build,  if  we  take  this  approach,  grows  exponen¬ 
tially  in  the  number  of  distinctions  that  we  make  as  we  try  to  solve  these  problems. 

What  Parsing  Techniques  Work? 

So  what  should  we  try  next?  One  idea  is  to  move  outward  in  the  language  hierarchy. 
Suppose  that,  instead  of  trying  to  describe  English  with  a  context-free  grammar,  we 
used  a  context-sensitive  one.  All  of  the  problems  that  we  just  examined  could  be  solved 
in  a  reasonable  way  if  we  did  that.  But  then  we'd  have  a  new  problem.  Recall  from 
Chapters  24  and  29  that  the  best  known  algorithm  for  examining  a  string  and  deciding 
whether  or  not  it  could  be  generated  by  an  arbitrary  context-sensitive  grammar  takes 
time  that  is  exponential  in  the  length  of  the  input  string.  That's  a  heavy  price  to  pay  to 
be  able  to  handle  the  relatively  small  number  of  features  that  cannot  easily  be  de¬ 
scribed  with  a  context-free  grammar.  As  a  result,  the  way  practical  English  parsers 
work  is  to: 

•  Give  up  trying  to  handle  constraints  like  agreement  within  the  set  of  context-free 
rules. 

•  But  don't  jump  all  the  way  to  context-sensitive  ones,  instead,  start  with  a  set  of 
context-free  rules  and  augment  them  with  features  that  must  match  in  order  for 
constituents  to  be  combined. 

In  particular,  a  common  approach  is  to  exploit  a  feature  grammar  (often  called  a 
unification  grammar)  of  the  sort  described  in  Section  24.3.  So.  for  example,  we  might 
handle  the  subjcct/verb  agreement  problem  by  rewriting  our  first  grammar  rule  as 
shown  in  Example  24.3: 

(CATEGORY  S]  — ■  [CATEGORY  NP  (CATEGORY  VP 

NUMBER  .v,  NUMBER  x, 

PERSON  X2\  PERSON  ,v2] 

Now  each  constituent  (a  complete  sentence,  a  noun  phrase,  or  a  verb  phrase)  is  aug¬ 
mented  with  two  features,  person  and  number.The  values  for  the  person  and  number 
features  of  the  NP  and  the  VP  must  match  since  they  are  specified  with  the  same  vari¬ 
able.  They  are  then  shared  with  the  S. 

If  the  number  of  fealure/valuc  pairs  is  finite,  then  it  is  possible  to  compile  a  gram¬ 
mar.  written  in  this  way.  into  a  standard  context-free  grammar  that  uses  a  distinct 
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nonterminal  or  terminal  symbol  for  each  feature  collection.  But,  in  the  case  of  a  re¬ 
alistic  English  grammar,  the  compiled  grammar  would  be  very  large.  In  Chapter  15, 
we  saw  that  there  exist  algorithms  (for  example  CKY  and  Earleyparse)  that  can 
parse  a  string  of  length  n,  given  an  arbitrary  context-free  grammar  G,  in  time  that  is 
O(n')  if  we  take  the  size  of  G  to  be  a  constant.  If,  on  the  other  hand,  we  consider  the 
size  of  G,  then  the  time  required  to  run  either  algorithm  grows  as  0(n3  •  |G|2).  It  has 
generally  been  found  to  be  substantially  more  efficient  to  use  a  feature-based  gram¬ 
mar  directly,  rather  than  to  attempt  to  compile  it  into  a  context-free  (and  very  much 
larger)  one. 


.3.4  Ambiguity 

If  we  want  to  build  programs  that  can  analyze  English  sentences,  it  is  not  sufficient  to 
construct  grammars  with  the  weak  generative  capacity  to  generate  all  and  only  the  syn¬ 
tactically  legal  English  sentences.  Any  useful  grammar  must  also  assign,  to  each  sen¬ 
tence,  a  parse  tree  whose  structure  corresponds  to  the  semantically  significant 
constituents  of  the  sentence. 

The  fragment  of  an  English  grammar  that  we  have  been  using  allows  prepositional 
phrases  to  be  attached  both  to  noun  phrases  and  to  verb  phrases.  Because  a  preposi¬ 
tional  phrase  can  be  attached  to  a  noun  phrase,  it  is  possible  to  produce  a  semantically 
meaningful  parse  tree  for  the  sentence, Chris  likes  the  girl  with  the  cat. The 
tree  shown  in  Figure  L.5  makes  it  clear  that  there  is  a  girl  with  a  cat  and  Chris  likes  her. 

Unfortunately,  because  a  prepositional  phrase  can  be  attached  to  a  verb  phrase,  it  is 
also  possible  to  produce  the  parse  tree  shown  in  Figure  L.6.  It  corresponds  to  a  nonsen¬ 
sical  interpretation  in  which  Chris  uses  a  cat  to  like  the  girl. 


FIGURE  L.5  A  semantically  meaningful  parse  tree. 
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FIGURE  L.6  A  non- 
sensical  parse  tree. 


likes  the  girl 


But  the  ability  to  attach  prepositional  phrases  to  verb  phrases  is  what  makes  it  pos¬ 
sible  to  construct  a  semantically  coherent  parse  tree  for  the  sentence,  Chri  s  shot  the 
bear  with  a  ri  fie. That  tree. shown  in  Figure  L.7.  makes  clear  that  the  rifle  is  the  in¬ 
strument  that  was  used  in  the  shooting.  Again  though,  it  is  also  possible  to  attach  the 
prepositional  phrase  to  the  nearest  noun  phrase.  So  our  grammar  can  produce  the  (in 
most  contexts)  nonsensical  parse  tree  shown  in  Figure  L.8. 

The  problem  we  must  face  is  that  any  English  grammar  that  is  powerful  enough 
to  produce  the  parse  trees  that  we  want  is  likely  to  contain  attachment  ambiguities 
similar  to  the  dangling  else  problem  that  occurs  in  many  programming  languages. 
Because  of  those  ambiguities,  it  is  typically  possible  to  produce  parse  trees  that  do 
not  correspond  to  the  intended  meaning  of  some  sentences.  For  programming  lan¬ 
guages,  the  solution  to  ambiguity  problems  is  to  design  them  away.  So,  for  example, 
in  Section  G.3,  we  discussed  two  techniques  for  designing  away  the  dangling  else 
problem. 


FIGURE  L.7 

Another  semantically 
meaningful  parse  tree. 


shot 


the  bear 
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FIGURE  L.8  Another  nonsensical  parse  tree. 


For  English,  however,  we  don’t  get  to  make  the  rules.  The  language  is  a  naturally  oc¬ 
curring  phenomenon  and  all  we  can  do  is  describe  it. This  means  that  we  must  general¬ 
ly  accept  a  grammar  that  can  generate  multiple  parse  trees  for  many  sentences.  Then 
we  require  some  additional  mechanism  for  choosing  the  correct  one.  This  almost  al¬ 
ways  requires  appeal  to  some  model  of  the  domain  of  discourse.  The  model  may  be  a 
statistical  one  that  has  been  built  by  examining  a  large  corpus  of  English  sentences 
(possibly  accompanied  by  hand-built  parse  trees)  B.  In  Section  11.10,  we  introduced 
stochastic  context-free  grammars,  which  are  one  way  in  which  such  probabilistic  infor¬ 
mation  can  be  encoded  and  exploited  during  parsing.  An  alternative  is  to  exploit  an  ex¬ 
plicit  model  of  facts  about  the  world.  So,  for  example,  we  could  encode  the  fact  that 
rifles  are  used  for  shooting.  In  1.3.3,  we  discussed  one  way  to  do  this:  Ruild  an  ontology 
that  describes  objects  and  their  properties.  An  alternative  is  to  build  a  rule-based  sys¬ 
tem  of  the  sort  we  describe  in  M.3. 


.5  Other  Reasons  English  Syntax  is  Hard 

We  conclude  this  brief  discussion  of  English  grammar  with  three  final  issues,  which  are 
illustrated  by  the  following  sentences: 

1.  ‘Furiously  sleep  ideas  green  colorless. 

2.  Colorless  green  ideas  sleep  furiously. 

3.  Chris  cooked. 

4.  The  potatoes  cooked. 

5.  Chris  and  the  girls  cooked. 
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6.  *  Chris  and  the  potatoes  cooked. 

7.  ?The  window  needs  cleaned. 

The  first  two  sentences  are  due  to  [Chomsky  1957].  The  tirst  is  unarguably  not 
English.  But  what  about  the  second?  It  satisfies  all  the  standard  rules  of  syntax.  Yet 
it  feels  "wrong".  The  problem  is  that  it  doesn't  "make  sense."  Now  consider  sen¬ 
tences  3-6.  The  first  three  are  fine.  But  sentence  6  is  wrong,  even  though  it  is  very 
similar  to  sentence  5.  The  problem  is  that  NP' s  may  be  conjoined  only  if  they  all  fill 
the  same  semantic  role  slot  with  respect  to  the  verb.  In  the  sentence. Chris  cooked. 
Chris  is  the  agent  (the  entity  that  is  causing  the  cooking  to  happen).  In  the  sentence, 
The  potatoes  cooked,  the  potatoes  are  the  patient  or  affected  entity  (the  thing  to 
which  the  cooking  is  being  done).  So  the  two  phrases.  Chris  and  the  potatoes 
cannot  be  conjoined.  Writing  grammars  for  English  is  complicated  by  the  fact  that 
there  is  no  clear  line  between  sentences  whose  syntax  is  wrong  and  sentences  that 
violate  various  semantic  constraints. 

Last,  consider  sentence  7.  In  western  Pennsylvania,  this  sentence  is  regarded  as  per¬ 
fectly  fine  English.  In  most  of  the  rest  of  the  world,  it  isn't.  All  natural  languages  that 
have  more  than  a  few  speakers  have  more  than  one  dialect.  It  is  hard  to  write  a  formal 
grammar  for  a  language  whose  elements  we  cannot  agree  on. 


L.3.6  Stochastic  Grammars 

Stochastic  context-free  grammars  play  an  important  role  in  computational  linguistics 
because  it  is  not  possible  to  build  practical,  unambiguous  grammars  for  natural  lan¬ 
guages  such  as  English.  Lacking  the  ability  to  find  a  single  parse,  the  next  best  thing  is 
to  find  the  most  likely  one  U. 

L.4  Building  a  Complete  NL  System 

A  complete  system  that  analyzes  natural  language  text  typically  performs  all  of  the  fol¬ 
lowing  tasks: 

•  morphological  analysis  and  part  of  speech  tagging. 

•  syntactic  parsing  of  each  sentence. 

•  semantic  interpretation  of  individual  sentences,  and 

•  interpretation  of  sentences  in  the  larger  context  of  the  rest  of  the  text. 

These  processes  may  be  done  in  stages,  or  they  may  be  integrated  in  a  variety  of 
ways.  We  discussed  the  first  step,  morphological  analysis  and  part  of  speech  tagging 
earlier  in  this  chapter.  We  summarized  some  of  the  issues  involved  in  natural  lan¬ 
guage  parsing  in  Section  15.4.  In  particular,  we  introduced  the  notion  of  bottom-up 
chart  parsing,  which  forms  the  basis  for  many  NL  parsers. The  last  two  tasks  are  be¬ 
yond  the  scope  of  the  techniques  that  we  have  described  in  this  book.  In  particular 
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semantic  interpretation  must  be  done  in  a  way  that  supports  the  particular  applica¬ 
tion  of  which  the  NL  system  is  part.  Generally  it  must  rely  on  some  model  of  the 
task  domain  and  it  may  require  the  ability  to  reason  about  objects  in  that  model.  In 
1.3.3  and  M.2,  we  comment  briefly  on  the  implications  of  our  undecidability  results 
for  our  ability  to  build  arbitrary  reasoning  engines. 


.5  Speech  Understanding  Systems 

So  far,  we  have  discussed  natural  language  processing  as  though  language  were  ex¬ 
clusively  a  written  phenomenon.  But  now  imagine  talking  to  your  computer  rather 
than  typing.  Consider  the  levels  of  analysis  required  to  make  that  idea  a  reality.  A 
speech  understanding  program  must  solve  all  of  the  problems  that  we  have  already 
mentioned: 


•  morphological  analysis  and  part  of  speech  tagging, 

•  syntactic  parsing  of  each  sentence, 

•  semantic  interpretation  of  individual  sentences,  and 

•  interpretation  of  sentences  in  the  larger  context  of  the  rest  of  the  text. 

And  it  must  solve  new  ones,  including: 

•  word  recognition  from  a  sound  wave,  and 

•  recovery  from  the  kinds  of  errors  that  people  often  make  when  they  talk.  For  exam¬ 
ple.  consider  the  following  utterance,  which  might  not  be  unusual  in  an  airline 
reservation  system: 

I  need  to  get  there  by  uh  noon  no  er  make  that  eleven. 

Hidden  Markov  models  (HMMs)  are  widely  used  to  build  practical  speech  under¬ 
standing  systems  H.  For  a  good  introduction,  see  (Rabiner  1989]  or  [Jurafsky  and  Mar¬ 
tin  2000].  Two  approaches  are  common: 


Use  HMMs  for  word  recognition  from  sound  waves. Then  use  other  techniques  as 
appropriate  to  build  the  larger  system.  In  this  approach,  we  train  a  collection  of 
HMMs.  one  for  each  word  (or  possibly  phrase)  in  the  active  lexicon. Then,  given  an 
observed  sound  sequence,  we  use  the  forward  algorithm  to  solve  the  evaluation 
problem.  In  other  words,  we  find  the  HMM  that  has  the  highest  probability  of  hav¬ 
ing  emitted  that  sound  sequence.  We  assert  that  the  word  or  phrase  that  we  heard  is 
the  word  that  corresponds  to  that  highest-scoring  HMM. This  approach  is  useful  for 
isolated  word  recognition. 


Use  an  integrated  HMM  model  of  the  entire  process  by  which  a  concept  to  be  com¬ 
municate  is  mapped  to  a  sentence  and  then  a  string  of  sounds.  Now.  to  understand 
an  utterance,  we  use  the  Viterbi  algorithm  to  solve  the  decoding  problem,  i.e.,  we 
find  the  maximum  likelihood  path  through  the  network.This  approach  is  useful  for 
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continuous  speech  processing,  where  there  are  no  pauses  (and  thus  no  clear 
boundaries)  between  words. 

We'll  sketch  here  the  use  ol  HMMs  for  isolated  word  recognition,  which  is  an  in¬ 
stance  of  the  evaluation  problem  that  we  outlined  in  Section  5.1 1.2:  Given  an  observa¬ 
tion  sequence  O  and  a  set  of  HMMs  that  describe  a  collection  of  possible  underlying 
models,  choose  the  HMM  that  is  most  likely  to  have  generated  O. To  solve  this  prob¬ 
lem.  we  need: 

•  a  set  of  HMMs.  one  for  each  word  (or  possibly  common  phrase)  that  may  have 
been  said,  and 

•  an  observed  sequence  of  sounds. 

When  we  hear  a  word,  we  generally  hear  a  continuous  sound  wave,  such  as  the 
one  shown  in  Figure  L.9.  It  was  generated  when  the  word  cacophony  was  uttered. 
The  .v-axis  in  the  figure  represents  lime  (in  milliseconds).  The  y-axis  represents  the 
amplitude  of  the  sound  wave.  A  sound  wave  can  be  analyzed  at  several  levels: 

•  The  sound  wave  is  digitized  by  sampling  at  some  rale.  The  signal  corresponding  to 
cacophony  was  sampled  at  22,050  Hz. 

•  The  samples  are  combined  into  slightly  larger  chunks,  called  frames.  A  frame  may 
correspond  to.  say.  10  milliseconds  of  the  signal.  Each  frame  is  then  described  by  a 
set  of  feature/value  pairs.  The  features  capture  the  properties  of  the  signal  that  are 
important  for  interpreting  it.  For  example,  they  may  describe  how  much  of  the  en¬ 
ergy  in  the  signal  occurs  in  each  of  several  frequency  bands. 

•  A  sequence  of  frames  will  correspond  to  a  phone.  A  phone  is  an  abstraction  of  a 
sound.  So  some  of  the  physical  variation  that  will  not  affect  interpretation  is  thrown 
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FIGURE  L.9  A  sound 
wave  corresponding  to 
the  word  cacophony. 
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away.  Phones  can  be  represented  using  one  of  a  small  number  of  standard  phonetic 
alphabets  S.  One  such  alphabet,  the  IPA  alphabet,  contains  over  100  symbols  plus 
over  50  marks  that  can  be  added  to  those  symbols.  For  example,  it  includes  the  sym¬ 
bols  [p]  as  in  penguin,  as  in  weather,  [0],  as  in  thin,  [i]  as  in  lily,  and  [1]  as  in  lily,  and 
[A],  as  in  cup.  An  alternative  alphabet,  the  ARPAbet,  is  widely  used  in  speech  pro¬ 
cessing  systems.  It  has  about  50  phones. 

•  The  phones  can  be  mapped  to  phonemes  in  a  particular  language  such  as  Eng¬ 
lish.  A  phoneme  corresponds  to  a  set  of  phones  that  function  identically  with  re¬ 
spect  to  a  particular  language.  For  example,  in  English,  there  is  no  functional 
difference  between  an  aspirated  p  and  an  unaspirated  p  (with  or  without  a  puff 
of  air  at  the  end).  In  other  words,  there  are  no  two  words  that  differ  only  in  one 
having  an  aspirated  p  and  the  other  having  an  unaspirated  one.  In  Thai,  on  the 
other  hand,  that  difference  is  as  important  as  the  difference  between  b  and  p  in 
English.  So  aspirated  and  unaspirated  p  correspond  to  the  same  phoneme  in 
English  but  not  in  Thai. 

•  A  sequence  of  phonemes  forms  a  word. 

One  reasonable  way  to  build  an  HMM  for  the  word  recognition  task  is  to  let  the 
states  correspond  to  phones.  Then  each  word  model  describes  the  various  phone  se¬ 
quences  that  could  correspond  to  that  word.  Associated  with  each  state  (phone)  is  a  list 
of  frames  that  could  describe  the  sound  that  would  be  generated  by  a  speaker  when  ut¬ 
tering  that  phone.  Since  there  is  variability  across  speakers,  there  may  be  more  than 
one  such  frame.  The  confusion  matrix,  B ,  will  contain  the  probability,  for  each  frame, 
that  it  is  the  one  that  is  uttered. 

To  get  better  accuracy,  it  may  be  useful  to  create  three  states  for  each  phone:  begin¬ 
ning.  middle,  and  end.  That  lets  us  describe  a  phone,  say  [t],  as  a  silence  followed  by  a 
burst  of  air.  For  an  even  more  accurate  model,  we  may  want  to  write  out  the  basic 
phone  sequence  for  a  word  and  then  apply  a  model  that  describes  coarticulation  ef¬ 
fects.  These  affects  occur  because  we  can't  pronounce  one  phone  and  then  another 
without  letting  our  mouth  and  tongue  move  continuously  from  their  first  position  to 
their  second.  While  they're  moving,  other  sounds  may  be  produced,  or  the  desired 
sound  may  be  altered,  effectively  due  to  laziness.  (Say  the  words  mitt  and  mitten  to 
yourself,  with  you  hand  in  front  of  your  mouth.  Can  you  hear  that  the  t  in  mi  tt  is  aspi¬ 
rated  but  the  t  in  mi  tten  is  not?)  These  effects  are  independent  of  the  particular  word 
that  is  being  spoken,  however.  So  we  can  build  individual  word  models  at  the  phone 
level,  then  apply  a  richer  model  of  what  goes  on  when  each  individual  phone  is  spoken, 
and  then  apply  a  coarliculation  model  to  describe  how  phones  are  combined  into  a 
single  speech  signal.  The  end  result  is  an  HMM  whose  states  correspond  to  phones 
and  whose  observable  outputs  correspond  to  frames  that  describe  physical  sounds. 
The  stale  structures  of  these  HMMs  are  typically  built  by  hand,  but  the  probabilities 
of  the  state  transitions,  as  well  as  the  output  probabilities,  can  be  extracted  from  la¬ 
beled  training  data.  The  initial  probabilities  for  each  of  the  word  models  can  also  be 
extracted  h  orn  training  data  so  that  the  final  decision  about  which  word  was  spoken  is 
conditions'  on  the  prior  probability  that  anyone  would  speak  that  word.  In  more 
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sophisticated  models,  those  probabilities  can  depend  on  some  number  of  prior  words 
of  context. 

So,  greatly  oversimplifying  the  picture,  an  HMM  for  the  two  words,  hit  and  hot 
is  shown  in  Figure  L.10.  The  nodes  labeled  Start  and  End  are  the  same  for  both 
words,  so  they  haven’t  been  expanded.  The  difference  between  the  two  words  is  the 
vowel  in  the  middle,  so  we  show  beginning,  middle,  and  end  states  for  them. The  rec¬ 
tangular  boxes  connected  to  those  states  correspond  to  what  is  observed  when  the 
words  are  spoken.  In  a  real  system,  those  observations  will  be  described  as  sets  of 
physical  parameters.  We’ve  just  written  numbers  that  are  suggestive  of  those  real 
values.  For  example,  we  see  that  the  middle  states  of  the  two  vowels  are  different,  as 


FIGURE  L.10  A 
simplified  HMM  for 
two  words,  hi  t 
and  hot. 


L.5  Speech  Understanding  Systems  1003 


indicated  by  the  fact  that  1-2  has  small  values  for  the  first  and  fourth  parameters, 
and  large  values  for  the  first  and  second,  while  0-2  has  the  situation  reversed.  At 
recognition  time,  most  of  the  acoustics  for  the  two  words  will  be  the  same,  but  when 
the  system  reaches  I- 2/0- 2,  it  will  be  able  make  a  decision  based  on  whether  large 
parameter  values  occur  in  the  middle  of  the  sound  or  at  the  edges.  Note  that,  if  sim¬ 
ilar  values  for  all  four  parameters  are  observed,  it  is  likely  that  the  word  that  was 
spoken  is  neither  hi  t  nor  hot. 
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In  1936.  before  there  were  computers.  Alun  Turing  described  the  formal  model  of 
computation  that  we  now  call  a  Turing  machine.  In  1950.  when  there  were  only 
handfuls  of  computers  on  the  planet  (and  the  computer  at  Manchester,  the  one 
with  w'hich  Turing  was  most  familiar,  had  about  20k  of  memory), Turing  again  wrote  a 
visionary  paper  |Turing  1950j.  in  which  he  suggested  that,  within  about  fifty  years,  there 
would  exist  a  computer  that  could  pass  what  has  come  to  be  called  the  Turing  test  D. 
Although  Turing's  version  of  the  test  (which  he  called  the  Imitation  Game)  was  a  bit 
different,  in  its  modern  form  it  can  be  stated  as  follows:  Imagine  an  interrogator  who 
can  type  English  questions  to  two  agents.  A  and  B.  who  are  in  another  room.  A  and  B 
in  turn  type  responses  to  the  interrogator's  quest  ions.' The  interrogator  knows  that  one 
of  the  two  agents  is  a  person  and  the  other  is  a  computer.  I  lis  job  is  to  figure  out  which 
is  which.  The  job  of  both  A  and  B  is  to  try  to  convince  the  interrogator  that  they  arc 
human. The  computer  passes  the  test  if  it  wins  and  finds  the  interrogator  into  thinking 
it  is  the  person.  The  specific  prediction  that  Turing  made  was.  “1  believe  that  in  about 
fifty  years  time  it  will  be  possible  to  programme  computers  with  a  storage  capacity  of 
about  HI*'  to  make  them  play  the  imitation  game  so  well  that  an  average  interrogator 
will  not  have  more  than  70  percent  chance  of  making  the  right  identification  after  five 
minutes  of  questioning." This  time.  Turing  has  turned  out  not  to  be  quite  right.  There 
exist  a  lot  of  interactive  chatbots  o  that  arc  capable  of  playing  the  game  according  to 
Turing’s  rules.  None  of  them  has  yet  "won"  the  game.  Try  conversing  with  a  couple  of 
them  and  see  if  they  could  fool  you. 

Turing  introduced  his  game  as  a  way  of  making  the  question.  “Can  a  machine  think?” 
concrete  enough  that  it  could  be  answered.  'Hie  game  is  flawed  in  many  wavs  as  a  test  of 
cognition.  In  his  paper. Turing  raised  several  objections  and  replied  to  them.  Additional  ob¬ 
jections  have  been  put  forward  over  the  years.  Some  of  the  most  serious  objections  include: 


•  A  key  aspect  of  human  intelligence  is  our  ability  to  perceive  the  world.  The  test 
doesn't  measure  sensory  perception.  So.  for  example,  there  is  no  way  for  the  inter¬ 
rogator  to  hand  one  of  the  agents  a  picture  and  ask  what  it’s  a  picture  of. 
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•  A  program  with  a  huge  store  of  canned  questions  and  answers  might  be  able  to  fool 
the  interrogator,  but  surely  it  doesn’t  think.  This  argument  is  often  called  the  Chi¬ 
nese  room  argument  [Searle  1980]  H.  Imagine  a  person  who  speaks  no  Chinese  but 
who  is  locked  in  a  room  with  filing  cabinets  full  of  slips  of  paper.  On  each  slip  of 
paper  is  written,  in  Chinese,  a  question  and  an  answer.  Now  suppose  that  an  inter¬ 
rogator  slides  questions,  also  written  in  Chinese,  under  the  door. The  job  of  the  per¬ 
son  inside  the  room  is  to  find  the  slip  that  matches  the  question,  copy  down  the 
answer,  as  given  on  the  slip,  and  slide  it  back  out  the  door. To  an  outside  observer  it 
may  appear  that  the  person  in  the  room  knows  Chinese  and  can  think  about  an¬ 
swers  to  questions.  But,  since  we  know  that  (s)he  doesn't  know  any  Chinese,  we 
know  that  all  that  is  happening  is  symbol  lookup,  not  thinking.  Might  it  be  that, 
even  if  a  program  could  pass  the  Turing  test,  it  isn't  thinking  anymore  than  the  per¬ 
son  in  the  room  was? 


While  these  arguments  have  merit  and  human  intelligence  is  complex,  they  don’t 
obscure  the  fact  that  Turing's  fundamental  claim  was  that  computers  are  sufficiently 
powerful,  in  principle,  to  be  able  to  act  “intelligently”.  And  while  Tbring  was  overly  op¬ 
timistic  in  how  long  it  would  take  before  computers  rivaled  humans  in  the  sort  of 
everyday  intelligence  that  his  imitation  game  measures,  he  was  right  that  computers 
can  be  programmed  to  perform  many  of  the  kinds  of  tasks  that  we  think  of  as  requiring 
intelligence  when  people  do  them.  In  fact,  almost  as  soon  as  there  were  computers, 
there  were  programs  that  proved  simple  theorems,  played  games  (such  as  checkers  and 
chess),  and  attempted  to  recognize  patterns  in  faces,  symbols,  and  drawings. 

Did  those  programs  exhibit  “artificial  intelligence”?  What  counts  as  “artificial  intel¬ 
ligence”?  It’s  impossible  to  provide  a  rigorous  definition  of  either  “artificial”  or  “intel¬ 
ligence”,  much  less  both  of  them.  We'll  begin  instead  with  a  more  than  20  year  old, 
pragmatic  definition  from  [Rich  and  Knight  1991]:  Artificial  intelligence  (AI)  is  the 
study  of  how  to  make  computers  do  things  that,  at  the  moment,  people  do  better.  Using 
that  definition,  yes,  those  early  programs  exhibited  AI.To  do  justice,  though,  to  much 
of  the  modern  work  that  is  being  done  on  the  boundary  between  AI,  databases,  and  the 
World  Wide  Web,  it  will  be  useful  to  expand  our  definition  as  follows:  Artificial  intelli¬ 
gence  ( AI)  is  the  study  of  how  to  make  computers  do  things  that  people  are  better  at 
or  would  he  belter  at  if  they  could  extend  what  they  do  to  a  World  Wide  Web-sized 
amount  of  data  and  not  make  mistakes. 


Over  halt  a  century  has  elapsed  since  Turing’s  paper  on  intelligent  machines.  Over 
those  years,  it  has  become  clear  that  one  reason  that  people  can  do  so  much  is  that  they 
know  a  lot.  Work  on  the  early  problems,  as  well  as  new  ones,  has  led  to  the  develop¬ 
ment  of  a  variety  of  techniques  for  acquiring  and  representing  knowledge  and  for  rea¬ 
soning  with  it.  rhese  techniques  have  been  applied  to  the  creation  of  programs  that 
read  English,  navigate  highways,  examine  pictures,  play  games,  and  diagnose  diseases 
(to  name  just  a  few  ol  the  hundieds  of  problem  domains  that  have  been  considered). 
Substantial  research  is  now  devoted  to  the  construction  of  intelligent  agents:  systems 
that  exploit  large  knowledge  sources  and  act  for  their  users  to  solve  problems  in  one  or 
more  domains.  We  discussed,  for  example,  one  branch  of  this  work  in  1.3,  where  we  de¬ 
scribed  the  design  ol  the  Semantic  Web. 
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Large  books,  for  example  [Russell  and  Norvig  2002  J.  can  barely  scratch  the  sur¬ 
face  of  this  area  and  we  can  do  only  a  small  traction  ol  that  in  a  few  pages  here,  ut 
the  theory  that  we  have  developed  in  this  book  has  a  lot  to  say  about  how  we  might 
go  about  building  an  intelligent  system.  One  of  the  most  important  aspects  of  human 
intelligence  is  our  ability  to  exploit  language.  In  Appendix  L.  we  considered  ways  in 
which  the  theory  of  formal  languages,  as  we  have  developed  it  here,  informs  the 
study  of  natural  language.  But.  in  that  discussion,  we  largely  ignored  the  fact  that  lin¬ 
guistic  utterances  are,  by  and  large,  about  something.  In  this  chapter  we  will  survey  a 
few  of  the  ways  that  the  theory  that  we  have  built  informs  our  attempt  to  build  pro¬ 
grams  that  know  something  and  that  can  use  what  they  know  to  solve  problems.  For 
more  examples,  see: 

•  A  discussion  of  the  programming  language  Lisp,  whose  design  was  inspired  by 
Church's  work  on  the  lambda  calculus  and  whose  structure  is  particularly  well-suited 
to  expressing  many  kinds  of  AI  programs.  (G.5) 

•  A  discussion  of  the  use  of  finite  stale  machines  in  the  design  of  a  controller  for  an 
intelligent,  soccer-playing  robot.  ( P.4) 

•  A  discussion  of  the  impact  of  complexity  on  the  design  of  programs  that  play  games 
like  chess  and  Go.  (N.2.5) 

•  A  discussion  of  various  techniques  that  are  used  in  the  design  of  intelligently  acting 
agents  in  computer  games.  (N.3) 

•  A  discussion  of  the  impact  of  the  undccidability  of  first-order  logic  and  the  in¬ 
tractability  of  Boolean  logic  on  ihe  design  of  the  Semantic  Web.  (1.3) 


M.1  The  Role  of  Search 

Search  appears  to  play  a  significant  role  in  the  design  of  intelligent  programs.  For  ex¬ 
ample.  theorem-proving  programs  search  a  space  of  possible  proofs.  Game-playing 
programs  search  a  space  of  possible  moves.  Natural  language  understanding  programs 
search  a  space  of  possible  sentence  parses  and  then  possible  meanings  that  can  be  as¬ 
signed  to  those  parses.  Medical  diagnosis  programs  search  a  space  of  possible  inter¬ 
pretations  of  the  observed  symptoms.  Alternatively,  we  can  view  intelligent  behavior 
as  pattern  matching  in  which  a  large  set  of  patterns  have  been  compiled  into  a  struc¬ 
ture  that  is  able  to  yield  answers  without  obvious  search.  Much  ol  what  goes  on  in 
human  cognition  may  be  able  to  be  described  in  this  way.  But.  in  this  view,  we  have 
simply  traded  search  in  one  space  for  search  in  another.  Instead  of  searching  in  a 
space  of  problem  solutions,  it  becomes  necessary  10  train  the  pattern  matchers.  And 
that  process  requires  search  in  a  space  of  pattern  arrangements.  For  example,  neural 
net-based  systems  search  a  space  of  weights  that  are  attached  to  the  connections  be¬ 
tween  nodes. 

While  some  of  the  search  problems  that  we  would  like  to  solve  in  Al  are  undecid- 
able  (theorem-proving  in  first-order  logic,  for  example),  the  main  way  in  which  the  the¬ 
ory  that  is  presented  in  this  book  impacts  Al  is  the  complexity  of  search  algorithms  In 
Section  30.3.  we  considered  the  problem  of  searching  a  space  that  is  too  large  to 
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enumerate  explicitly.  Thai  discussion  applies  widely  in  Al.  For  example,  the  A*  search 
algorithm,  which  we  presented  there,  has  been  used  extensively  in  AI  applications,  as 
have  been  a  variety  of  extensions  and  modifications  of  it.  Specialized  search  algorithms 
have  also  been  developed  to  solve  particular  kinds  of  problems.  For  example,  resolu¬ 
tion  (as  described  in  B.2.2)  is  important  as  a  way  to  limit  search  in  theorem-proving 
programs.  Another  example  is  the  minimax  algorithm,  described  in  N.2.5,  which  man¬ 
ages  search  in  game  trees. 

In  the  rest  of  this  chapter,  we  will  consider  the  two  main  problems  that  must  be 
solved  if  we  are  to  build  intelligent  systems.  We  must  find  a  way  to  encode,  acquire,  and 
evolve  the  huge  amount  of  knowledge  that  appears  to  be  required  for  all  but  the  most 
trivial  tasks,  and  we  must  find  a  way  to  manage  search  in  one  or  more  spaces  that  are 
defined  by  that  knowledge.  Over  the  years,  several  approaches  to  the  knowledge  repre¬ 
sentation  problem  have  been  developed.  We’ll  focus  our  discussion  on  two  of  them— 
logical  systems  and  rule-based  systems.  We’ve  chosen  those  two  because  they  represent 
important  applications  of  the  theory  that  we  have  been  discussing.  But  we  should  point 
oul  that  other  promising  approaches  exist.  In  particular,  statistical  approaches,  includ¬ 
ing  both  high-level,  symbolic  techniques  as  well  as  neural  net  models,  are  widely  used 
in  many  applications. 


.2  A  Logical  Foundation  for  Artificial  Intelligence 

One  approach  to  building  artificial  reasoning  systems  is  to  encode  the  relevant  knowl¬ 
edge  in  a  logical  language  and  to  use  the  rules  of  logical  inference  to  solve  problems.  In 
other  words,  we  solve  problems  by  proving  theorems.  One  appeal  of  this  approach,  as 
opposed,  say,  to  encoding  knowledge  procedurally  in  the  code  that  uses  it,  is  that 
knowledge  can  be  stated  independently  of  how  it  is  to  be  used.  That  means  that  the 
same  knowledge  base  may  be  able  to  be  used  for  multiple  applications.  For  an  excel¬ 
lent  introduction  to  the  logical  approach  to  knowledge  representation,  see  [Brachman 
and  Levesque  2004], 


2.1  The  Fundamental  Issues 

To  use  a  logical  representation  as  the  basis  for  building  an  intelligent  software  solution 
to  a  problem,  we  must  satisly  the  same  requirements  that  we  considered  in  the  design 
of  automatic  program  verification  systems.  So  we  need  a  logical  system  that  meets  all 
of  the  following  requirements: 


1.  It  is  expressive  enough  to  make  it  possible  to  encode  the  knowledge  required  to 
solve  the  problem. 


2.  Its  decidability  properties  are  strong  enough  to  make  it  useful. 

3.  Its  complexity  is  acceptable  for  the  size  problems  we  wish  to  solve. 


P'actiea!  reasoning  programs  generally  choose  a  system  that  starts  with  firsl-ordei 
logic  ).  w  ic  may  t  en  be  extended,  to  add  expressive  power,  or  restricted,  tc 
add  tractabihly  and  perhaps  decidability. 
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Extensions  to  FOL  may  include  one  or  more  higher-order  techniques,  appropriately 
chosen  to  support  the  task  at  hand.  For  example,  many  general  purpose  systems  support 
first-order  logic  plus  equality.  Other  systems  are  enhanced  to  support  reasoning  about 
lime  and  belief. 

Another  important  extension  to  FOL  allows  nonmonotonic  reasoning.  To  define 
what  we  mean  by  that,  note  first  that  reasoning  in  standard  first-order  logic  is  monoto¬ 
nic.  in  the  sense  that  adding  an  axiom  may  make  some  additional  statements  provable. 
But  no  statements  that  were  provable  without  the  new'  axiom  cease  to  be  provable  with 
it.  Suppose,  though,  that  our  logic  allows  statements  such  as.  "If  x  is  a  bird,  assume  it 
can  fly  unless  you  know  that  it  is  a  penguin  or  that  it  is  a  baby  or  that  it  has  a  broken 
wing  or  thut  it  fell  into  an  oil  slick".  Now,  with  just  the  fact  that  Tweety  is  a  bird,  we  will 
be  able  to  conclude  that  Tweety  can  lly.  But  if  the  fact  that  Tweety  is  a  penguin  is  added 
later,  that  conclusion  will  no  longer  he  justified.  We’ll  say  that  a  logical  system  is  non¬ 
monotonic  iff  the  addition  of  one  or  more  axioms  may  remove  some  sentence  from  the 
set  of  theorems.  Default  reasoning  of  the  sort  we  just  saw'  in  the  Tweety  example  is  one 
of  the  most  common  uses  for  nonmonotonic  reasoning.  It  plays  an  important  role  in 
applications  that  must  reason  with  incomplete  information. 

Going  the  other  direction,  it  often  makes  sense  to  exploit  a  logical  language  that  is 
weaker  than  FOL.  Weaker  languages  may  possess  decidability  and  traclability  prob¬ 
lems  that  FOL  lacks.  In  M.2.3,  we’ll  mention  one  such  language,  the  language  of  Horn 
clauses.  In  Section  1.3.4.  we  mentioned  another. description  logics. 


M.2.2  A  Brief  History  of  Theorem-Proving  Systems 
and  Their  Applications 


The  title  “first  artificial  intelligence  program”  probably  belongs  to  a  theorem  prover. 
The  Logic  Theorist  (or  just  LT)  (Newell,  Shaw  and  Simon  1 957 j  debuted  at  the  1956 
summer  Dartmouth  conference  that  is  generally  regarded  as  the  birthday  of  the  field 
called  A l  O.  LT  did  what  mathematicians  do:  It  proved  theorems.  It  proved,  for  exam¬ 
ple.  most  of  the  theorems  in  Chapter  2  of  Primipia  Mathcmatica  [Whitehead  and  Rus¬ 
sell  1910, 1912. 1913],  LT  used  three  rules  of  inference:  substitution  (which  allows  any 
expression  to  be  substituted,  consistently,  for  any  variable),  replacement  (which  allows 
any  logical  connective  to  be  replaced  by  its  definition,  and  vice  versa),  and  detachment 
(which  allows,  if  A  and  A  — 1 ►  B  are  theorems,  to  assert  the  new  theorem  B).  LT  began 
with  the  five  axioms  given  in  Primipia  Mathcmatica.  From  there,  it  began  to  prove 
Principal  $  theorems.  For  example,  in  about  12  minutes  it  produced  the  following 
proof,  for  Theorem  2.45: 


<P 

1.  A-*(A  v  B) 

2.  p-*(p  V  </) 

3.  (A  ->  B)  —  (^B-*^A) 

4.  (/>“*(/>  v  </))—♦(-.(/>  V  q)  — * 

5.  -.(/>  v  </) 

O.  E.  D. 


(Theorem  2.45,  to  be  proved.) 
(Theorem  2.2.) 

(Suhst./)  for  A.  q  for  B  in  1.) 
(Theorem  2.16.) 

(Subst.  p  for  A,  (p  v  q)  for  B  in  3.) 
(Detach  right  side  of  4.  using  2.) 
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(Note  that  both  upper  and  lower  case  symbols  correspond  lo  variables.) The  inference 
rules  that  LT  used  are  not  complete  and  the  proofs  it  produced  are  trivial  by  modern 
standards.  For  example,  given  the  axioms  and  the  theorems  prior  to  it,  LT  tried  for  23 
minutes  but  failed  to  prove  theorem  2.31: 

[p  V  (</  V  r)])  — *  [(p  V  q)  V  r]. 

LT's  significance  lies  in  the  fact  that  it  opened  the  door  to  the  development  of  more 
powerful  systems. 

The  designers  of  LT  did  not  face  difficult  representational  issues.  LTs  job  was  to 
prove  theorems  that  had  already  been  stated  in  the  language  of  first-order  logic.  But 
the  use  of  theorem  provers  is  not  limited  to  domains  that  start  out  looking  like  mathe¬ 
matics.  For  example,  theorem  provers  have  been  used  to  reason  about  the  semantics  of 
natural  lunguage  sentences.  So  attempts  have  been  made  to  find  logical  representa¬ 
tions  of  almost  everything  that  we  can  talk  about. 

Practically  useful  theorem  provers  must  work  with  large  sets  of  facts  (axioms)  and 
with  a  stronger  set  of  inference  rules  than  LT  used.  So,  even  if  the  representation  ques¬ 
tion  can  be  answered  and  first-order  logic  can  be  shown  to  be  expressively  powerful 
enough,  two  important  issues  remain: 

Undecidability ;  First-order  logic,  given  a  complete  inference  procedure,  and  with 
no  restrictions  on  the  axioms  that  may  be  presented,  is  undecidable,  as  we  showed  in 
Theorem  22.4.  So,  while  there  are  algorithms  that  find  proofs  when  they  exist,  any  such 
algorithm  cannot  be  guaranteed  to  halt  and  fail  when  asked  to  prove  a  nontheorem. 

Intractability:  First-order  logic  is  typically  computationally  intractable.  It  is  at  least 
as  hard  as  Boolean  logic  and  we  showed,  in  Theorem  28.16  (the  Cook-Levin  Theorem), 
that  Boolean  satisfiability  is  NP-compIcte.The  language  of  quantified  Boolean  formu¬ 
las  (QBF)  that  wc  defined  in  Section  29.3.1  also  involves  a  representational  system 
that  is  weaker  than  first-order  logic.  While  it  allows  quantifiers,  it  does  not  allow  func¬ 
tions,  so  it  can  describe  only  finite  domains.  Yet  we  showed  that  QBF  appears  not  even 
to  be  in  NP;  it  is  PSPACE-complete.  And  decidable  theories  in  full  first-order  logic  can 
he  even  harder.  For  example,  we  pointed  out.  in  Section  28.9.3,  that  any  algorithm  that 
decides  Presburger  arithmetic  has  time  complexity  at  least  0(2~  ). 

Hie  problem  is  that  finding  a  proof  requires  searching  a  space  of  possible  proofs 
and,  in  real  pioblem  contexts,  with  realistic  axiom  sets,  that  space  is  huge.  Substantial 
research  over  the  half  century  since  LT  appeared  has  been  devoted  to  techniques  for 
pruning  the  space  of  proofs  that  must  be  examined.  jDavis  and  Putnam  1960)  defined 
conjunctive  noimal  form  (B.2)  for  first-order  logic  and  showed  that  it  could  be  used  as 
the  basis  for  a  theorem-prover  that  was  substantially  more  efficient  than  the  others 
that  were  available  at  the  lime.  But  the  most  significant  breakthrough  occurred  with 
the  development  of  the  resolution  technique  that  we  described  in  B.2.2.  Resolution- 
based  theorem  provers  have  been  used  lo  prove  mathematical  theorems,  to  verily  the 
correctness  ol  programs  and  to  reason  in  domains  such  as  engineering  design  and  med¬ 
icine.  In  many  important  domains,  however,  resolution,  as  we  have  described  it.  is  still 
not  efficient  enough  because  the  space  of  possible  proofs  is  too  large. 

To  deal  with  these  issues,  practical,  logic-based  systems  make  compromises.  In  the 
next  section,  we  describe  one  approach  to  making  such  compromises. 


1010  Appendix  M  Applications:  Artificial  Intelligence  and  Computational  Reasoning 


M.2.3  Horn  Clauses,  Logic  Programming  and  Prolog 

Suppose  that  we  are  willing  to  consider  only  axioms  with  one  of  the  following  two 
forms: 

•  Implication  rules:  Each  such  rule  contains  no  existentially  quantified  variables.  It 
contains  one  instance  of  the  logical  connector  — * .  It  also  contains  zero  or  more  pos¬ 
itive  literals  anded  together  on  its  left-hand  side  and  precisely  one  positive  literal  on 
its  right  hand  side.  So  tf.v  (( /*( a))  is  an  implication  rule  with  a  trivial  (empty)  left- 
hand^  side.  V.v  ((A»,(.v)  A  f*2(jr)  A  ...  A  P„(x))  —  R(x))  is  an  implication  rule  with  a 
nonempty  left-hand  side. 

•  Basic  facts:  R(A ).  for  some  specific  individual  A. 

Further  suppose  that  we  are  willing  to  limit  the  form  of  the  statements  that  we  will 
attempt  to  prove.  All  variables  must  he  existentially  quantified,  all  literals  must  be  pos¬ 
itive.  and  the  only  logical  connector  that  is  allowed  is  a.  So  the  following  could  be  a 
goal  to  be  proved: 

•  3.v.  y  ( l\(x.  y)  A  /Nf.v)  A ...  A  P„(y)). 

Given  these  constraints,  it  is  possible  to  build  a  theorem  prover  that  exploits  a  single 
reasoning  process,  backward  chaining.  Backward  chaining  works  by  starting  with  a 
goal  (a  statement  to  be  proved).  Note  that  the  goal  may  be  a  conjunction  of  subgoals. 
The  backward  chainer  chooses  one  subgoal  and  looks  to  see  if  it  matches  a  basic  fact.  If  it 
does,  that  subgoal  is  proved:  it  requires  no  further  action  and  the  prover  can  move  on  to 
the  next  subgoal.  Otherwise,  the  backward  chainer  looks  to  see  if  the  subgoal  matches 
the  right-hand  side  of  some  implication  rule.  If  it  does,  the  matched  subgoal  will  be 
replaced  by  the  set  of  literals  that  make  up  the  left-hand  side  of  that  rule. That  estab¬ 
lishes  each  of  those  literals  as  a  new  subgoal. This  process  continues  until  all  subgoals 
have  been  proved. 

A  significant  property  of  the  backward  chaining  process  that  we  will  describe  is  that 
the  proofs  that  it  finds  ure  constructive  in  the  following  sense:  The  proof  that  some 
value  of  x  exists  will  include  an  explicit  statement  of  such  a  value. 


EXAMPLE  M.1  Backward  Chaining 

Suppose  that  we  are  given  the  following  knowledge  base,  which  is  composed  of 
some  implication  rules  that  describe  our  company's  hiring  policy  (statements 
[  1 1—15))  and  some  basic  facts  of  the  sort  that  would  be  encoded  in  a  recruiter’s 
database  (statements  [b]-|l0]). 

m  Vx  (Fumous(x)  —*  Great-hire(x)) 

(2)  V.v  ( Good-mujor{x )  A  Grcal-grades(x)  A  Did-internshin[x )  — *• 
Great-hire(x)) 

|3]  V.r  ( Major(x,  Computer  Science)  -*  Gnnd-mujor(x)) 
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[4]  Vx  ( Major(x ,  Engineering)  — *  Good-major{x )) 

[5]  Vx  (Vy  (G/M(x,  y)  A  Greater-than{y,  3.5)  — *  Grear-grarfes(x)) 

[6]  Major(John,  English) 

[7]  Major{Ellen,  ComputerScience) 

[8]  GPA(Ellen,  3.9) 

[9]  Did-internship{John) 

[  1 0]  Did-internship(Ellen) 

We  want  to  find  someone  great  to  hire.  We  will  attempt  to  do  that  by  proving 
3x  ( Great-hire{x ))  and,  in  the  process,  finding  such  an  x.  So  we  set  it  as  a  goal.  No 
facts  match  the  goal.  So  we  look  for  an  implication  rule  whose  right-hand  side 
matches  it.  There  are  two,  so  we  must  pick  one.  We'll  use  a  simple  strategy: Try  the 
rules  in  the  order  in  which  they  occur  in  the  knowledge  base.  So  we  choose  state¬ 
ment  [  1  ].  Using  it,  we  rewrite  the  original  goal  as  a  new  one. The  easiest  way  to  en¬ 
vision  this  process  is  as  a  goal  tree  (not  a  very  bushy  one  just  yet): 

3.r  (Great-hire(x)) 


Famous{x) 

The  new  goal  cannot  be  satisfied  with  the  knowledge  that  we  have.  So  we  must 
back  up.  This  time,  we  apply  statement  [2]  to  the  original  goal,  producing: 


3x  (Grau-hire(x)) 


Good-major  (x) 


Grcat-grades(x) 


Did-inlemship(x) 


We  next  tackle  the  first  new  subgoal.  We  choose  the  first  rule  ([3])  whose  right- 
hand  side  matches  it,  and  we  get: 


Good-major(x ) 


3jc  (Greal-hire(x)) 

— r — 

Greai-grades(x) 


Did-internship(x) 


Major  ( x .  ComputerScience) 


Con  mumg  .o  explore  the  current  path,  we  exploit  tact  [7].TOs  time  we  must  unify 
(match)  he  variable  x  with  the  constant  Ellen.  When  we  do  that,  we  will  have  to  apply 
the  resulting  substitution  to  the  rest  of  the  goal  tree  since  we  must  find  a  single  value 
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EXAMPLE  M.1  ( Continued ) 

of  x  that  satisfies  all  three  subgoals.  Also,  well  remember  this  substitution  because  it 
will  be  used  to  construct  the  answer  to  the  original  question.  Doing  this,  we  get: 


3.t  (Greai-liireix)) 


Gtnnl-nuijur(x )  Greal-grutles  ( Ellen )  Did-inlvmship(  Ellen ) 


Mti)or[x.  C 'omputtrScienct ) 


Majors  Ellen.  CinnpuierScience) 


The  left-most  branch  reports  that  it  has  succeeded  since  it  matched  against  a 
fact  in  the  knowledge  base.  So  we  now  back  up  and  begin  working  on  the  second 
subgoal. The  rest  of  the  process  proceeds  in  a  similar  way  (by  immediately  match¬ 
ing  facts).  So  all  three  branches  will  succeed  with  the  substitution  of  Ellen  for  x , 

The  answer  Ellen  can  be  reported. 

Note  the  following  key  properties  of  the  theorem-proving  process  that  we  just 

sketched: 

•  Questions  are  slated  as  goals  to  be  proved. 

•  Answers  are  constructed  by  binding  values  to  the  variable(s)  in  the  question. 

•  The  prover  looks  for  a  proof  by  chaining  backwards  from  the  goal,  using  depth-first 
search.  By  focusing  on  the  goal,  it  avoids  searching  in  parts  of  the  knowledge  base 
that  are  unrelated  to  the  current  problem. 

•  The  prover  attempts  to  match  facts  and  rules  in  the  order  in  which  they  occur  in  the 
knowledge  base.  So  the  knowledge  base  builder  can  tell  the  prover  which  paths  are 
more  likely  to  lead  to  solutions. 

•  All  implication  rules  have  a  single  clause  on  the  right-hand  side.  So  right-hand  sides 
match  against  individual  subgoals. 

•  Paths  halt  and  succeed  whenever  they  reach  a  single  goal  that  can  be  matched 
against  a  fact  in  the  knowledge  base. 

•  Paths  halt  and  fail  whenever  it  can  be  determined  that  there  is  no  sentence  in  the 
knowledge  base  that  can  match  the  current  goal. 

Horn  Clauses  and  SLD  Resolution 

Recall,  from  B.2. 1.  that  a  clause  is  either  a  single  literal  or  a  disjunction  of  literals.  Now 

define  the  following  kinds  of  clauses. 

•  A  Horn  clause  is  a  clause  with  at  most  one  positive  literal. 

•  A  positive  Horn  clause  is  a  Horn  clause  with  exactly  one  positive  literal.  A  positive 
Horn  clause  is  also  called  a  definite  clause. 
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•  A  negative  Horn  clause  is  a  Horn  clause  with  no  positive  literals. 

•  The  empty  clause  (which  we  call  nil)  is  a  Horn  clause  with  no  literals. 

The  efficiency  of  resolution  can  be  improved  when  it  is  restricted  to  Horn  clauses.  If 
we  observe  the  restrictions  that  we  stated  at  the  beginning  of  this  section,  then,  when 
our  knowledge  base  and  goal  are  converted  to  clause  form,  all  of  the  clauses  that  will 
be  produced  will  be  Horn  clauses.  This  must  be  so  since: 

•  An  implication  rule  must  be  of  the  form  Vjc,  y,...((P|(...)  A  P^...)  A...  A 
P„(  ...))—*  /?l ... )).  Such  a  rule,  when  converted  to  clause  form,  will  be  a  positive 
Horn  clause.  It  will  contain  zero  or  more  negative  literals  (corresponding  to  the  lit¬ 
erals  on  the  left-hand  side  of  the  original  rule)  and  exactly  one  positive  literal  (cor¬ 
responding  to  the  single  literal  on  the  right-hand  side  of  the  original  rule).  So  it  will 
look  like: 


-*P |(  . . .  )  V  . . .)  V  .. .  V  -iP„(  . . .)  V  /?(...). 

•  A  basic  fact  is  a  ground  instance  of  the  form  R{A),  for  some  specific  individual  A. 
Such  facts  are  already  in  clause  form  and  they  are  positive  Horn  clauses  since  they 
contain  no  negative  literals  and  exactly  one  positive  one. 

•  A  goal  must  be  of  the  form  3.v,  y. . . .  (Pi(  . . . )  A  P2( . .  ■ )  A . . .  A  P„( ...)),  To  use 
resolution  to  prove  a  goal,  we  begin  by  negating  it, producing  Vx,  y,.. . (-<(Pi( ... )  A 
P2( .  - . )  A ...  A  P„( . . . ))).  Converting  that  to  clause  form,  wc  get  a  negative  Horn 
clause  (i.e.,onc  with  no  positive  literals): 

-./>,(...)  V-P2(...)V...V-P, ,(...)). 

We’ll  say  that  a  set  of  clauses  is  in  Horn  clause  form ,  or  that  it  is  a  Horn  clause 
knowledge  base  iff  it  is  a  set  of  clauses  all  of  which  are  Horn  clauses.  Given  a  Horn 
clause  knowledge  base,  a  resolution  theorem  prover  can  start  with  a  negative  clause 
(corresponding  to  a  negated  goal)  and  use  backward  chaining  in  exactly  the  way  that 
we  used  it  in  Example  M.l.  In  particular,  such  a  theorem  prover  can  avoid  considering 
those  parts  ol  its  knowledge  base  that  cannot  be  relevant  to  its  current  goal.To  see  why 
this  works,  observe  the  following  facts  about  resolution  with  Horn  clauses: 


» 


At  each  resolution  step,  at  least  one  parent  clause  must  be  positive  (so  that  there 
exists  a  pair  of  complementary  literals). 

Resolution  of  a  negative  clause  with  a  fact  will  create  a  resolvent  that  is  also  a  neg¬ 
ative  clause.  Further,  the  number  of  literals  in  the  new  negative  clause  will  be  one 
less  than  the  number  in  the  parent  negative  clause. 


Resolution  ol  a  negative  clause  with  a  rule  will  create  a  resolvent  that  is  also  a  ne 
alive  clause. The  one  positive  literal  in  the  rule  will  form  a  complementary  pair  wi 
one  negative  literal  in  the  negative  clause.  So  the  number  oflilerals  in  thi  new  ne 
alive  clause  will  be  the  sum  of  the  number  of  literals  in  the  two  parents  minus  tv 

Resolution  of  a  rule  with  either  another  rule  or  a  fact  will  create  a  resolvent  thal 
a  so  <iiue  wuc  wi  e  represented  as  a  positive  clause). The  positive  literal  fn 
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one  of  the  parents  must  be  complementary  to  a  negative  literal  in  the  other  parent. 
That  leaves  the  positive  literal  from  the  other  parent  as  the  only  positive  literal  in 
the  resolvent. 


So  the  property  of  being  a  Horn  clause  knowledge  base  is  preserved  by  resolution. 

It  is  important  to  point  out  that  Horn  clause  form  differs  from  clause  form  (as  de¬ 
fined  in  B.2.1 )  in  a  very  important  way.  Given  an  arbitrary  sentence  s  in  first-order 
logic,  there  exists  a  clause  form  representation  s’  of  v  with  the  property  that  s'  is  unsat- 
isfiable  iff  .v  is.  In  other  words, clause  form  is  a  normal  form,  in  our  usual  sense.  Every 
logical  sentence  has  an  equivalent  (with  respect  to  unsalisfiability)  representation  in 
clause  form.  But  that  is  not  so  for  Horn  clause  form.  There  are  first-order  sentences 
that  cannot  be  expressed  as  Horn  clauses.  A  simple  such  example  is  -,P(A)—*  R(A). 
Converting  this  sentence  to  clause  form,  we  gel  P(A)  V  R{A),  which  is  not  a  Horn 
clause  because  it  contains  two  positive  literals. 

So  Horn  clause  form  is  not  as  expressive  as  full  first-order  logic.  But  it  is  impor¬ 
tant  because  it  is  possible  to  prove  the  following  additional  fact  about  Horn  clause 
resolution: 


•  If  c  is  any  negative  Horn  clause  (including  nil)  and  c  is  entailed  by  some  set  5  of 
Horn  clauses,  then  there  is  a  resolution  proof  of  v  with  the  properly  that  every  re¬ 
solvent  that  is  created  by  the  proof  has,  as  its  two  parents,  one  positive  clause  in  S 
and  one  negative  clause.  At  the  first  resolution  step,  this  negative  clause  must,  of 
course,  be  one  of  the  clauses  in  S.  At  each  step  after  that,  the  negative  clause  must 
have  been  the  one  that  was  produced  at  the  previous  resolution  step. 


This  fact  is  the  basis  for  an  efficient,  restricted  form  of  resolution  called  SLD  res¬ 
olution.  At  the  first  step  of  an  SLD  resolution  proof,  one  parent  clause  must  be  a 
positive  clause  and  one  must  be  a  negative  clause.  At  each  step  after  the  first,  one 
parent  clause  must  be  a  previously  generated  negative  clause  (i.e..  one  that  was  not 
in  the  original  knowledge  base)  and  one  must  be  some  positive  clause  from  the  orig¬ 
inal  knowledge  base.  So  for  example,  using  SLD  resolution,  we  will  never  resolve  two 
positive  clauses  (i.e.,  rules  or  facts)  together.  Nor  will  we  resolve  two  clauses  neither 
of  which  was  in  the  original  knowledge  base  (i.e..  two  clauses  derived  by  prior  reso¬ 
lution  steps). 

SLD  resolution  is  refutation  complete  for  Horn  clauses.  (It  is  not.  however,  refuta¬ 
tion  complete  for  arbitrary  sets  of  clauses.  See  Exercise  B.W.)  So.  if  there  exists  a  way  to 
derive  nil  from  a  Horn  clause  knowledge  base  S.  then  there  exists  a  way  to  derive  it 
from  S  using  the  SLD  strategy.  No  other  ways  of  choosing  parent  clauses  need  to  be 
considered.  Notice  that  SLD  resolution  implements  the  set-of-support  strategy  that  we 
described  in  B.1.2.  Notice  also  that  SLD  resolution  can  be  thought  of  as  backward 
chaining  from  one  initial  goal,  just  as  we  described  in  Example  M. I.  Some  search  and 
backtracking  may  still  be  required  since  there  may  be  more  than  one  wav  to  resolve 
the  most  recent  negative  clause  against  the  rest  of  the  knowledge  base  and  it  may  not 
be  immediately  obvious  which  one  (if  any)  will  succeed  in  producing  a  contradiction 
But  the  search  is  focused  only  on  possible  parent  clause  pairs  that  include  one  positive 
clause  that  existed  in  the  original  knowledge  base  and  one  negative  clause  that  is  di¬ 
rectly  descended  from  a  single  initial  negative  clause. 
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Logic  Programming  and  Prolog 

Now  consider  an  idea:  A  set  of  logical  statements  can  be  thought  of  as  a  program  that 
solves  problems  by  proving  theorems  about  potential  solutions.  Logic  programming  is  an 
approach  to  programming  that  takes  this  view.  Clearly  logic  programming  only  makes 
sense  as  a  way  to  solve  real  problems  if  it  can  exploit  an  efficient  theorem-proving  engine. 

The  most  widely  used  logic  programming  language  is  Prolog  S.  Modern  Prolog  sys¬ 
tems  contain  a  wide  assortment  of  tools.  For  example,  they  may  support  object-oriented 
programming,  graphical  interfaces,  and  Web-based  application  development.  But  the 
core  of  every  Prolog  system  is  a  Horn  clause-based,  resolution  theorem  prover  that  ex¬ 
ploits  SLD  resolution.  Despite  the  expressive  limits  of  Horn  clauses,  Prolog  has  proven 
to  be  a  useful,  relatively  high-level  language  for  expressing  rules  in  many  different  kinds 
of  domains,  ranging  from  circuit  design,  to  help  desks  to  Web  brokers.  We’ll  mention  an 
application  to  music  in  N.1.2. 

Prolog  programs  may  be  interpreted  or  compiled.  In  either  case,  we  need  to  be  able 
to  talk  about  how  a  program  will  be  executed.  We’ll  use  the  term  Prolog  virtual  ma¬ 
chine  to  describe  the  mechanism  by  which  a  Prolog  program  is  run.  A  Prolog  program 
consists  of  a  knowledge  base  (of  implication  rules  and  basic  facts,  all  of  which  can  be 
written  as  positive  Horn  clauses)  and  a  goal  (with  the  restricted  form  described  above). 
Since  reasoning  is  done  using  resolution,  the  Prolog  virtual  machine,  given  a  goal  G, 
begins  by  negating  G  to  produce  the  only  negative  clause  in  its  knowledge  base.  Then 
the  Prolog  virtual  machine  uses  depth-first  search  and  SLD  resolution  to  reason  back¬ 
wards  from  -i  G.  So,  at  each  resolution  step,  the  negative  clause  that  was  created  at  the 
previous  step  will  be  one  of  the  parent  clauses  unless  all  ways  of  resolving  with  it  have 
already  been  tried.  In  that  case,  the  prover  will  back  up  to  the  next  most  recently  gen¬ 
erated  negative  clause  and  look  for  an  alternative  way  to  use  it  as  a  parent  clause.  This 
process  continues  until  either  the  empty  clause,/?//,  is  generated  or  there  is  nothing  left 
to  do.  If  nil  is  generated,  (i.e..  a  contradiction  and  thus  a  proof  has  been  found),  then 
the  variable  bindings  that  led  to  the  contradiction  of  the  knowledge  base  with  -i G  can 
be  returned.  So,  when  the  Prolog  virtual  machine  proves  a  goal  of  the  form 
3.v,  y....(P|(  ••• )  A  P ;(...)  A  ...  a  P„( ...)),  it  also  returns  values  for  the  variables 
that  make  (/»,(...)  A  P ^( . . . )  A . . .  A  PM( . . . )  true. 

When  resolving  with  a  negative  clause  G,  the  Prolog  virtual  machine  will  consider 
the  literals  in  G  one  at  a  time,  left  to  right.To  work  on  each  of  those  literals,  it  will  con¬ 
sider  the  knowledge  base  clauses  in  the  order  in  which  they  were  written. 


EXAMPLE  M.2  The  Order  in  Which  a  Prolog  Program 
Considers  Resolvents 

Suppose  that  the  most  recently  generated  negative  clause  is  ^R(x)  V  -,P(*) 
V  iS(x),  and  suppose  that  the  knowledge  base  contains  the  following  clauses: 

111  -T(jc)  V  P(x). 

[2)  -iG(x)  V  P(.v). 
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EXAMPLE  M.2  ( Continued ) 

131  ^//(x)  v  R(x). 

|4)  -,mx)VS(. x). 

151  //(-v). 

The  Prolog  virtual  machine  will  first  look  for  a  way  to  use  ->R(x)  as  a  comple¬ 
mentary  literal.  So  it  will  resolve  — »/?(.v)  V  -<P(x)  V  -'S(x)  with  (3).  producing  the 
new  negative  clause  — .// (_v)  v  -~>P{x)  V  -i5(.r).  It  will  next  try  to  use  ->H(x)  as  a 
complementary  literal,  so  it  will  resolve  with  |5|,  producing  ->P(x)  V  -.S(.r).  Next 
it  will  try  to  use  ->P(x).  There  are  two  rules  that  contain  the  literal  P(.v).  It  will  try 
1 1]  first,  producing  -<T(x)  V  -iS(x).  It  will  continue  from  here  until  it  either  proves 
nil  or  hits  a  dead  end.  In  the  latter  case,  it  will  hack  up  the  first  negative  clause  for 
which  some  alternative  resolution  step  exists. 


Notice  that  the  control  strategy  that  the  Prolog  virtual  machine  uses  enables  knowl¬ 
edge  base  builders  to  encode,  in  their  knowledge  bases,  information  about  which  facts 
and  rules  are  more  likely  to  be  useful. Tire  more  useful  clauses  get  written  ahead  of  the 
less  useful  ones.  So.  on  average,  we  can  expect  that  the  Prolog  virtual  machine  finds 
proofs  faster  than  would  a  prover  that  was  working  without  any  such  “hints”. 

Different  dialects  of  Prolog  differ  in  their  syntax.  In  the  examples  that  we  are  about 
to  present,  we  will  use  the  syntax  as  defined  in  |('locksin  and  Mellislr  19K1).  Variables 
are  written  in  upper  case  and  constants  in  lower  case.  A  Prolog  program  consists  of  a 
goal,  plus  a  knowledge  base  of  implication  rules  and  basic  facts.  The  syntax  of  each  of 
these  pieces  is  shown  next.  Variables  in  rules  and  facts  are  universally  quantified.  Vari¬ 
ables  in  goals  (also  called  queries)  are  existentially  quantified.  Hie  logical  connective 
AND  is  written  as  a  comma. 

•  A  goal. such  us,  3x  (P(x)  A  Q{x)  A  7’(.vJ).  is  written  in  Prolog  as: 

?-  PCX).  q(X),  t(X). 

So.  for  example,  the  following  are  legal  Prolog  goals: 

?-  know(marcus,  caesar).  I*  Does  Marcus  know  Caesar? 

?-  know(marcus,  X)  I*  Does  Marcus  know  anyone 

(and.  if  so,  whom)? 

?-  know(X,  Y),  know(Y,  marcus)  /*  Docs  anyone  know  someone 

who  knows  Marcus? 

•  Implication  rules  can  all  be  written  as  positive  I  lorn  clauses.  But.  in  a  Prolog  program, 
they  arc  written  in  their  more  natural  form,  except  that  the  right  hand  side  and  left- 
hand  side  are  reversed  (suggesting  the  way  that  the  rules  are  used  in  backward  chain¬ 
ing).  So  the  mapping  between  implication  rules.  Horn  clause  form,  and  Prolog  syntax  is: 

•  Implication  rule:  Vx  (( P(  v)  A  Q(x)  A  ...  a  7  (a  ))  —*  R(x)). 

•  Horn  clause  form:  -,/*( x)  V  ~>Q(x)  7  ...  y  -T(x)  v  R(.\). 

«  Prolog  rule  syntax:  r(X)  :-p(X),  q(X)  ,  ,  t(X). 


M.2  A  Logical  Foundation  for  Artificial  Intelligence  1017 


Read  the  symbol  :  -  as  “if".  Read  a  comma  as  “and”.  So  the  Prolog  rule  we  just 
showed  can  naturally  be  read  as must  be  true  if  P,£) . and  T  are."  Alterna¬ 

tively,  it  can  be  thought  of  as  saying,  “Rewrite  the  goal  R  as  the  set  of  subgoals 

P,Q . 7  and  then  attempt  to  prove  them.”  An  example  of  an  implication  rule 

written  in  Prolog  syntax  is: 

living-ancestor-of CX,  Y)  :«  mother-of(X,  Y),  alive(X). 

This  rule  corresponds  to  the  implication  rule.  Vx  (( Mother-of(x ,  y)  A  Alive(x)) 
—*  Living-ancestor-of  (x,  y)). 

A  fact, such  as  GPA(Ellen,3.9)  is  written  in  Prolog  as  gpa Cell en,  3. 9). The  facts 
that  a  Prolog  program  uses  may  be  listed  explicitly  in  the  program  itself  or  they 
may  be  contained  in  one  ot  more  databases  to  which  the  program  refers. 


EXAMPLE  M.3  Prolog  Implements  Backward  Chaining 
Using  Resolution 


To  see  how  the  Prolog  virtual  machine  conducts  a  depth-first  search  using  resolu¬ 
tion,  we’ll  return  to  our  hiring  example.  The  implication  rules  and  the  facts  of  Ex¬ 
ample  M.l  can  be  written  as  a  Prolog  program  as  follows: 

great-hi re(X)  famous (X). 

great-hire(X)  good-major(X) ,  great -grades (X) , 

did-intern(X) . 

good-major(X)  major(Xt  computer-science). 

good-major(X)  major(X,  engineering). 

great-grades(X)  gpa(X,  Y),  greater-than(Y,  3.5). 

majorCjohn,  english). 

majorCellen,  computer-science). 

gpa (ell en,  3.9). 

did-intern(john). 

did-intern(ellen) . 


This  program  creates  the  following  list  of  Horn  clauses: 

-iFumous(x)  V  Great-hire(x)). 

~Good-major{x)  V  ^Great-grades(x)  V  ^Did-intern(x)  V  Great-hire(x). 
->Major(x,  Computer  Science)  v  Good-major(x). 

-,Major(x,  Engineering)  v  Good-major(x). 

-iGPA(x,  y)  A  -<}reater-than{y,  3.5)  V  Great-grades(x). 

Major(John,  English ). 

M  ajor(  Ellen ,  Com  puter  Science). 
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EXAMPLE  M.3  ( Continued ) 


GPA(Ellen,  3.9). 

Did-intern(John ). 

Did-intern(Ellen). 

We  again  want  to  prove,  3or  (Great-hire(x)).  We  state  that  goal  in  Prolog  as: 

?-  great-hi re(X) . 

The  Prolog  virtual  machine  negates  the  goal  and  creates  the  negative  Horn 
clause.  -Great-hire (.v).  Resolution  then  answers  the  question  by  producing  the 
following  proof: 


-tComl-nw/oiix)  v  -tGreni-^riulesi x  1  v  - tDid-inlemix )  v  Greni-lure(x)  —,Greol-hire(x) 


-tMujor{x.  CotnpuierScience )  v  Gamt-majorix)  —\Goeul-nuijniix)  v  —iGreui-ftrmlesix )  v  - >Di(l-intem{X ) 


Ellen.  CoinpulerScInur )  ->Mtiji>r{x.  CtiinpiiierStienee)  v  ireai-grtule\{.\ )  v  -,Ditl-inlern(x) 


-iGPA (jv. >■)  a  -iGrcater-tltdit( y.  3.5)  v Grcui-grades(x\  -iGreni-gnulesi Ellen)  v  -tl)id-intern(Ellen) 


GPA\ Ellen.  3.9)  ->GPA( Ellen,  v)  a  -iGrenier-ilinnl  v.  3.5)  v  -il)i<ltntem(Ellen) 


Greaier-thaniV 

(computed  to  be  true)  —firealer-ihtuiOM.  3.5)  v  I)i<l-iniem[ Ellen ) 


rttf 


To  make  Prolog  useful  in  solving  real  and  complex  problems,  the  structure  that  we 
have  just  described  must  be  augmented  in  several  ways.  One  is  to  add  the  “cut"  opera¬ 
tor.  which  lets  a  programmer  specify,  anywhere  in  a  clause,  that  once  the  literals  to  the 
left  of  the  cut  have  been  solved  successfully,  the  interpreter  may  not  back  up  and  try  to 
resolve  them  in  some  other  way. 

A  second  important  extension  helps  to  make  the  job  of  describing  a  complex  world 
tractable. To  see  how  it  works,  we'll  first  describe  an  assumption  that  is  reasonable  in  many 
domains: The  closed  world  assumption  says  that  the  only  facts  that  are  true  are  those  that 
have  been  explicitly  declared.  If  a  database  (or  knowledge  base)  satisfies  the  closed  world 
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assumption,  then  if  we  want  to  prove  P.  it  suffices  to  look  for  P  and  fail  to  find  it.  In 
many  domains,  it  may  not  be  reasonable  to  make  the  closed  world  assumption  for  all 
predicates,  but  there  may  be  some  (even  many)  predicates  for  which  it  does  make  sense. 

For  example,  we  can  assume  that  a  travel  database  contains  all  scheduled  airline 
flights.  We  can  assume  that  a  university  registrar’s  database  contains  all  scheduled 
classes.  We  can  assume  that  a  company’s  personnel  database  contains  all  of  the  compa¬ 
ny's  employees.  Finally,  suppose  that  we  want  to  build  a  planning  agent  that  must  limit 
its  actions  to  things  that  are  legal.  In  most  systems  of  law,  we  don’t  list  legal  things. 
What  we  do  is  to  write  laws  that  define  illegal  activities.  Then  anything  that  is  not  ex¬ 
plicitly  illegal  is  legal  (until  someone  figures  out  how  to  ban  it).  So  suppose  that  we 
have  the  following  Prolog  rule: 

choose-action(X)  :-  reasonabl e-cost (X) ,  legal (X). 

Assuming  also  a  list  only  of  illegal  actions,  how  can  the  Prolog  interpreter,  given  a 
proposed  action  X.  show  that  X  is  legal?  The  answer  is  that  the  Prolog  interpreter  pro¬ 
vides  an  operator  (generally  \+)  that  implements  a  form  of  reasoning  called  negation 
as  failure.  The  negation  as  failure  inference  rule  says  that  P  may  be  concluded  if  it  is 
not  possible  to  prove  ->P.  So.  we  could  write: 

legal(X):-  \+  illegal(X). 

Note  that  the  negation  as  failure  rule  is  a  form  of  nonmonotonic  reasoning.  The  ad¬ 
dition  of  a  fact  (for  example  the  assertion  that  some  new  activity  is  illegal)  could  make 
it  no  longer  possible  to  prove  a  claim  that  could,  before  the  addition  of  the  new  fact, 
have  been  proved.  Thus  Prolog  programs  can  implement  reasoning  that  cannot  be  de¬ 
scribed  in  standard  first-order  logic. 


2.4  Do  Undecidability  and  Intractability  Doom  a  Logical 
Approach  to  Artificial  Intelligence? 

The  techniques  that  we  have  just  described,  plus  others,  make  it  possible  to  build  logic- 
based  reasoning  systems  that  can  solve  a  wide  variety  of  problems.  But  we  know  that 
first-order  logic  is  not  powerful  enough  to  describe  all  the  things  that  we  might  want  to 
say  to  a  general-purpose  reasoning  system.  For  example,  since  it  is  monotonic.it  doesn't 
let  us  describe  and  exploit  the  fact  that  birds  fly  unless  there  is  something  special  about 
them,  lo  use  this  fact,  we’d  want  to  be  able  to  conclude  that  Tweety  flies  if  all  we  know 
about  him  is  that  he  is  a  bird.  We’d  also  want  to  be  able  to  undo  that  conclusion  if  we 
later  find  out  that  Tweety  has  had  his  wings  clipped.  As  another  example  of  the  limits 
of  first-order  logic,  consider  reasoning  about  belief.  Suppose  that  P(Felix)  is  a  predi¬ 
cate  and  we  want  to  assert  that  Linus  believes  it  to  be  true.  Then  we  might  like  to  be 
able  to  wr ite,  helieves(  Linus.  P{Felix)).  But  in  first-order  logic,  predicates  cannot  be  ar¬ 
guments  to  other  predicates.  To  solve  these  and  other  problems,  a  variety  of  more  ex¬ 
pressively  powerful  logical  systems  have  been  developed 

Suppose  that  we  want  to  build  an  artificial  agent  with  human-like  capabilities  and 
that  we  have  succeeded  m  building  a  logical  theory  that  captures  enough  relevant 
know  l  gc  a  iu  l  wor  (a  wild  stretch  from  the  current  state  of  the  art).  If  for  no 
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other  reason  than  that  the  world  is  changing,  the  theory  that  we  will  build  will  necessarily 
be  incomplete  (in  the  sense  that  there  are  statements  that  are  true  in  the  world  but  un- 
provable  in  the  theory).  So  suppose  further  that  we  have  also  developed  techniques  for 
acquiring  (learning)  new  information  as  it  is  required. 

Are  we.  nevertheless,  doomed?  In  particular,  we  know  that  first-order  logic  (even 
without  the  enhancements  mentioned  above)  is  undecidable  and  intractable.  Does  this 
mean  that  any  attempt  to  build  a  powerful  logical  reasoning  agent  must  necessarily 
fail  ?  We  might  guess  that  the  answer  to  this  question  is  no.  Afler  all,  despite  his  proof 
fifteen  years  earlier  of  the  undecidability  of  the  Entscheidungsproblem.  Alan  Turing 
forecast  in  the  middle  of  the  last  century  that,  by  the  turn  of  the  century,  intelligent  pro¬ 
grams  would  exist. 

Although  we  have  not  yet  succeeded  in  building  an  artificial  system  that  rivals  the  in¬ 
telligent  behavior  of  people,  the  negative  results  that  we  have  presented  throughout  this 
book  do  not  necessarily  doom  our  attempt  to  do  so.  To  see  why  they  don't,  consider: 

The  undecidability  issue:  Theorem  22.4  tells  us  that  no  algorithm  exists  that  always 
halts  and  decides  whether  a  statement  s  is  a  theorem  in  some  particular  logical  theory. 

So.  given  any  theorem-deciding  algorithm  that  we  might  attempt  to  write,  there  will  be 
some  statements  on  which  the  algorithm  will  fail  to  halt.  But  Theorem  22.4  does  not 
say  that  we  can’t  build  an  algorithm  that  halts  with  the  correct  result  much  (or  even 
most)  of  the  time  we  actually  call  it.  In  particular,  theoremhood  in  first  order  logic  is 
semidecidable.  So,  if  our  program  attempts  to  prove  statements  that  are  theorems,  it 
will  always  succeed  (assuming  we  wrote  it  correctly).  We  can  imagine  building  a  pro¬ 
gram  that,  with  enough  knowledge,  rarely  tries  to  prove  something  that  turns  out  to  be 
false.  More  generally,  we  can  build  theorem  provers  with  the  property  that,  if  we  con¬ 
sider  only  statements  that  have  been  constructed  because  they  say  something  relevant 
to  the  task  at  hand,  it  is  overwhelmingly  likely  that  they  will  be  either  provable  or  dis- 
provable.  Further,  we  can  prevent  our  agent  from  getting  hopelessly  stuek  by  imposing 
an  effort  limit  on  each  proof  attempt.  If  a  proof  cannot  be  found  within  the  assigned 
amount  of  time,  the  agent  can  give  up  and  try  something  else. 

We  started  with  the  goal  of  building  an  intelligent  agent,  perhaps  one  that  could 
rival  people  in  performing  some  or  all  of  the  things  we  do.  It  is  hard  to  compare  people 
to  Turing  machines,  primarily  because  people  have  sophisticated  perceptual  systems 
and  Turing  machines  don't.  We  are  only  beginning  to  understand  how  to  construct  real¬ 
istic  computational  models  of  those  human  systems.  But  there  is  no  reason  to  believe 
that  people  have  any  special  oracle-like  capability  that  would  enable  us  to  compute  a 
logical  function  that  a  Turing  machine  could  not  compute.  The  fact  is  that  the  undecid- 
abilitv  of  first-order  logic  limits  the  way  people  use  logic  just  as  it  limits  w  hat  programs 
can  do.  Yet  we  solve  hard  problems  all  the  lime.  Appropriately  designed  programs, 
once  we  understand  how  to  build  them,  will  be  able  to  do  the  same  thing. 

The  intractability  issue:  The  intractability  claims  that  we  have  made  about  logical 
systems  are  claims  about  worst-case  performance.  There  exist  techniques,  such  as  the 
ones  we’ve  described,  that  succeed  much  of  the  time,  particularly  when  there  is  struc¬ 
ture  to  the  space  that  needs  to  be  searched.  So  there  may  be  relevant  conclusions  that 
will  not  be  found  soon  enough  to  be  useful.  But  that  happens  to  people  too 

The  bottom  line:  There  are  reasons,  beyond  the  undecidability  and  intractability  of 
FOL.  that  artificial  intelligence  remains  difficult  and  appears  to  require  techniques  be 
yond  the  logical  ones  that  we  have  been  discussing.  We  suggested  in  Section  M  l  for 
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example,  that  statistical  techniques  can  be  useful. both  for  representing  knowledge  and 
for  acquiring  it.  Even  with  those  techniques,  though,  no  program  has  yet  passed  the 
Turing  lest  or  equaled  the  ability  of  people  to  analyze  photographs. 

But  the  reason  that  no  A1  program  has  done  these  things  is  not  that  FOL  is  undecid- 
able  and  even  simpler  logics  are  intractable.  The  proofs  we  have  presented  for  our  un¬ 
decidability  and  intractability  results  do  not  depend  on  the  physical  mechanism  by 
which  compulation  is  performed.  So  they  apply  to  people,  as  well  as  to  machines.  And 
people  remain  the  existence  proof  that  intelligence  is  possible. 


.2.5  A  Complete  and  Decidable  Legal  System? 

Going  the  other  way,  the  negative  results  that  we  have  presented  do  tell  us  that  we 
cannot  expect  a  mechanized  logical  system  to  solve  all  the  problems  inherent  in  the 
less  formal  systems  that  people  have  built  up  over  the  centuries.  We  consider  here  one 
example.,h 

When  we  say  that  our  society  is  based  on  the  rule  of  law,  we  assume  that  we  can 
write  a  set  of  specific  laws  with  four  important  properties: 

•  The  laws  capture  the  rules  by  which  we  want  to  live.  So  it  must  be  the  case  that  all  al¬ 
lowable  actions  are  legal  and  all  unallowable  actions  are  not  legal.  (This  requirement 
assumes  that  we  can  all  agree  on  how  we  want  to  live.  We'll  ignore  that  issue  here.) 

•  The  set  of  laws  is  consistent.  In  other  words,  we  must  guarantee  that  there  is  no  ac¬ 
tion  A  that  can  be  shown  to  be  both  legal  and  illegal. 

•  The  set  of  laws  is  finite  and  reasonably  maintainable.  So,  for  example,  we  must  re¬ 
ject  any  system  that  requires  a  separate  law,  for  each  specific  citizen,  mandating 
that  that  citizen  pay  taxes. 

•  It  is  possible  to  answer  the  question,  "Given  an  action  A.  is  A  legal?'' 

Will  we  ever  be  able  to  write  a  single  set  of  laws  that  satisfies  all  of  those  goals? 

lo  do  so  requires  that  there  be  some  deductive  system  in  which  we  can: 

•  Describe  each  individual  law. 

•  Derive  conclusions  about  the  legality  of  specific  actions  using  some  collection  of 
rules  of  inference  applied  lo  the  set  of  laws. 

If  we  could  be  content  with,  say.  the  following  set  of  laws,  then  we  could  create  a  de¬ 
cidable  system: 

•  Mr.  Smith  must  pay  his  taxes  this  year. 

•  Either  Ms.  Jones  or  Ms.  Garcia  must  fix  the  potholes  this  year. 

We  can  represent  these  laws  in  Boolean  logic  (for  example,  with  the  two  axioms: 
5/’7  and  J  ■  v  Gl-  P.  Since  Boolean  logic  is  decidable,  we  can  prove,  for  example  that 
SPT  A  J I  P  is  legal,  while  (-J  FP  a  ^(JFP)  is  not  legal. 


"-n,c  question  we  describe  here  was  suggested  by  Ben  Kuipcrs. 
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But  clearly  this  won’t  (Jo.  We  require  a  first-order  logical  system  so  that  we  can  write 
laws  such  as: 

•  y.v  (Vy  (citizcit(x)  A  year(y)  A  tilive{x.  v)  — *  piiyi(txcsforycttr(x ,  y )))- 

•  vy  ( B.v  (year(y)  — *  fixes  potholes  (v.  y))). 

There  are.  of  course,  first-order  theories  that  are  complete  and  thus  decidable.  But 
we  know,  from  Section  22.4.2.  that  any  consistent  first-order  theory  that  is  powerful 
enough  to  describe  the  integers,  along  with  addition  and  multiplication,  cannot  be  ei¬ 
ther  complete  or  decidable.  We  cannot  describe  our  modern  legal  and  business  system 
without  those  capabilities,  plus  a  myriad  of  others.  (If  you  doubt  this,  take  a  look  at  last 
year's  income  tax  forms.)  Thus  we  cannot  construct  a  decidable  system  of  laws. 


M.3  A  Rule-Based  Foundation  for  Artificial  Intelligence 
and  Cognition 

In  the  early  days  of  the  development  of  formal  models  of  computation.  Emil  Post  pro¬ 
posed  |  Post  1943]  a  family  of  computational  models  based  on  the  idea  of  rewrite  sys¬ 
tems  (alternatively  called  production  systems  or  rule-based  systems).  We  mentioned 
one  of  them.  Post  production  systems  in  Chapter  IX.  While  it  has  turned  out  that  Tur¬ 
ing’s  model  has  proved  more  useful  than  Post's  as  a  basis  for  analyzing  computability. 
Post's  ideas  have  served  as  the  inspiration  for  at  least  five  families  of  important  compu¬ 
tational  systems:  We  have  already  described  the  use  of  grammars  to  define  languages. 

In  particular.  Post's  ideas  inspired  the  design  of  BNF  as  a  tool  for  defining  context-free 
languages. 

The  other  four  families  of  rule-based  systems  that  we  w  ill  mention  have  arisen  from 
the  fact  that  it  is  natural  to  model  many  kinds  of  human  cognitive  processes  as  rewrite 
(or  production  rule)  systems.  So  we  will  consider: 

•  The  use  of  production-rule  architectures  as  models  not  just  of  what  people  can  do 
hut  of  how  they  do  it.  Rules  of  this  sort  are  thus  used  to  define  cognitive  models  of 
people. 

•  The  use  of  production  rules  to  encode  the  knowledge  that  people  use  when  they 
solve  many  kinds  of  problems  that  require  specific  expertise.  Rules  of  this  sort  are 
used  by  programs  that  have  come  to  be  called  expert  systems. 

•  The  use  of  production  rules  to  encode  the  business  practice  rules  by  which  organi¬ 
zations  function. 

•  The  use  of  production  rules  to  describe  the  behavior  of  nonplaver  characters 
(NPC’s)  in  interactive  games.  We'll  discuss  this  application  in  N.3.3.  where  we'll  see 
rules  being  used  in  much  the  same  way  in  which  they  are  used  to  encode  human 
knowledge  in  expert  systems. 

But  rule-based  systems  are  not  limited  to  these  problem  areas.  For  example,  there  is 
a  rule-based  system  i-j  that  enhances  voice  signals  over  noise  in  hearing  aids. 
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.1  The  Architecture  of  a  Rule-Based  System 

A  rule-based  system  has  three  main  components: 

•  a  knowledge  base,  which  typically  contains  a  collection  of  rules  and  a  collection  of 
basic  facts. 

•  working  memory,  and 

•  an  interpreter,  often  called  an  inference  engine,  that  controls  the  way  in  which  rules 
and  facts  are  applied  to  solve  a  problem. 

The  Knowledge  Base 

In  a  rule-based  system,  problem-solving  knowledge  is  encoded  primarily  as  a  set  of 
rules  that  can  be  thought  of  as  condition/action  pairs.  Whenever  a  rule’s  left-hand  side 
matches  the  current  problem  situation,  its  right-hand  side  can  be  used  to  make 
progress  toward  finding  a  solution.  The  rules  typically  appeal  to  facts  about  the  prob¬ 
lem  domain  and  the  entities  in  it.  So  the  rule  base  is  generally  augmented  with  a  declar¬ 
ative  knowledge  base  that  can  be  represented  in  a  standard  database  format,  in  a 
logical  description  language,  or  in  some  other  knowledge-representation  formalism. 

The  way  in  which  rules  are  written  varies  across  rule-based  systems,  primarily  as  a 
function  of  the  knowledge  the  rules  encode  and  how  they  are  going  to  be  used  to  solve 
problems.  We've  already  seen  one  way  to  represent  rules:  as  Prolog  programs. The  Pro¬ 
log  approach  may  be  appropriate  when  the  facts  can  be  stated  in  Prolog’s  logical  lan¬ 
guage  and  when  problem-solving  is  to  be  done  by  backward  chaining.  We’ll  briefly 
mention  a  few  other  examples  that  illustrate  the  kinds  of  knowledge  and  reasoning 
that  rule-based  systems  can  encode.  In  all  of  them,  we'll  describe  rules  in  English  so 
that  they  make  sense  to  us.  The  internal  representation  will  depend  on  the  tools  that 
are  used  to  build  the  system. 


EXAMPLE  M.4  Financial  Planning 

Rule-based  systems  have  been  used  in  a  variety  of  financial  applications.  Consider 
the  following  two  simplified  rules  from  a  financial  planning  system: 

If:  age  >50,  and 

time-to-retirement  <10  years,  and 
children  to  support, 

Then;  personal-state  is  conservative. 

If:  personal-state  is  conservative,  and 

financial  state  is  aggressive,  and 

risk-tolerance  is  high. 

Then:  buy  .t%  bonds, y%  stocks,  and  z%  cash. 

Notice  that  these  rules  can  be  chained  together  to  enable  the  system  to  reason 

from  basic  facts,  provided  by  the  user,  to  a  conclusion  that  describes  an  appropri¬ 
ate  action. 
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EXAMPLE  M.5  Computing  Damage  Claims  in  a  Civil  Suit 

Sometimes,  although  we  require  a  single  answer,  many  factors  may  affect  how 
that  answer  should  be  computed.  Then  we  can  write  a  collection  of  rules,  each  of 
which  modifies  the  answer  in  some  appropriate  way.  For  example,  a  rule-based 
system  that  provides  advice  on  the  size  of  a  damage  claim  that  it  would  make 
sense  to  seek  in  a  civil  suit  might  have  rules  like: 

If:  disfiguring  injury. 

Then:  add  $100,000  to  damages. 

If:  work-time-lost  >3  weeks. 

Then:  add  $75,000  to  damages. 


EXAMPLE  M.6  Medical  Diagnosis 

Rule-based  systems  have  been  widely  used  in  expert  systems  that  perform  diag¬ 
nosis  tasks,  ranging  from  medicine  to  computer  system  repair.  In  these  systems, 
there  is  often  some  degree  of  uncertainty  that  should  be  attached  to  the  conclu¬ 
sion  of  each  rule.  In  this  simple  example  rule,  read  (.8)  to  mean  that  on  some  scale 
(sometimes  0  to  1,  sometimes  -1  to  l,  depending  on  the  system),  the  certainty 
that  should  be  attached  to  the  conclusion  of  this  rule  is  .8. 

If:  spots,  and 

fever,  and 
aches. 

Then:  chicken  pox  (.8). 


Working  Memory 

In  order  for  a  rule  to  be  applied,  its  left-hand  side  must  match  against  something  and 
its  right-hand  side  must  act  on  something.  Those  somethings  are  called  working 
memory.  When  we  use  a  grammar  to  derive  a  siring,  working  memory  is  simply  a  sin¬ 
gle  string,  which  we  have  been  calling  the  working  string.  In  other  kinds  of  rule-based 
systems,  working  memory  may  have  a  more  complex  structure.  It  may  start  out 
empty.  Assertions  may  be  udded  to  it  us  data  are  provided,  either  by  a  user  or  by 
some  other  input  mechanism.  As  rules  fire,  the  contents  of  the  working  memory  will 
change. 

The  Inference  Engine 

Rules  are  applied  to  the  contents  of  working  memory  and  results  are  computed  bv  the 
action  of  a  rule  interpreter  that  is  usually  called  an  inference  engine.  Recall  that  in 
Section  1  l.l ,  we  presented  a  very  general  definition  of  a  rule-based  system.  We  pointed 
out  there  that,  to  build  a  particular  rule-based  system,  it  is  necessary  to  define  a  rule 
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interpreter  that  specifies  how  and  when  rules  will  be  applied.  The  design  of  such  an 
interpreter  requires  an  answer  to  each  of  the  following  questions. 

•  Efficient  matching :  How  can  the  left-hand  sides  of  a  large  rule  base  be  compared  ef¬ 
ficiently  against  the  contents  of  working  memory?  Many  inference  engines  solve  this 
problem  be  exploiting  the  RETE  algorithm,  which  we  will  describe  briefly  below. 

•  Conflict  resolution :  A  conflict  occurs  whenever  there  is  more  than  one  way  to 
match  the  rules  against  working  memory.  How  will  such  conflicts  be  resolved  so 
that  a  single  rule  can  be  chosen  to  be  applied  next? 

•  Direction  of  reasoning:  In  what  direction  will  reasoning  proceed?  One  simple  an¬ 
swer  is  to  use  forward  chaining  and  to  reason  from  observables  to  conclusions.  An 
alternative  is  to  use  backward  chaining  (as  in  Prolog)  and  to  reason  from  goals  back 
to  observables.  Various  hybrid  approaches  may  also  make  sense. 

Strategies  for  conflict  resolution  and  reasoning  direction  vary  as  a  function  of  the 
problem  that  is  being  solved.  For  example,  in  some  task  domains,  it  makes  sense  to  re¬ 
solve  conflicts  using  a  recency  heuristic  in  which  rules  that  match  against  facts  that  have 
recently  been  added  to  working  memory  will  be  given  priority  over  rules  that  match 
against  older  information.  But  that  heuristic  isn’t  appropriate  in  some  other  domains. 

Efficient  matching  is,  however,  an  issue  that  is  important  in  any  rule-based  system 
that  exploits  a  nontrivial-sized  rule  base  or  working  memory.  Many  w'idely  used  infer¬ 
ence  engines  solve  the  matching  problem  by  exploiting  some  version  of  the  RETE  al¬ 
gorithm  a  |Forgy  1982].  The  name  RETE  comes  from  the  Latin  word  rete,  which 
means  net  or  network.  Two  key  ideas  underlie  the  design  of  RETE: 

•  Instead  of  simply  placing  rules  in  a  flat  list  and  treating  all  of  the  left-hand  sides  as 
independent  patterns  to  be  matched,  it  makes  sense  to  build  a  tree. The  nodes  of  the 
tree  correspond  to  patterns  that  occur  in  left-hand  sides  of  rules. The  leaf  nodes  cor¬ 
respond  to  complete  left-hand  sides,  so  they  point  to  the  associated  right-hand  sides. 

•  Working  memory  doesn’t  change  very  often.  Thus,  by  and  large,  the  rules  that 
matched  the  last  time  we  checked  will  still  match.  So,  instead  of  starting  from 
scratch  and  comparing  each  left-hand  side  against  working  memory  each  time  a 
rule  must  be  selected,  let  each  node  in  the  pattern  tree  keep  track  of  the  working 
memory  elements  that  match  it.  Whenever  working  memory  changes,  update  the 
appropriate  values  at  the  tree  nodes. 

Inference  engines  can  be  implemented  independently  of  the  specific  rule  sets  that 
will  run  on  them. There  exist  a  variety  of  commercial  tools  U  that  include  inference  en¬ 
gines  and  that  can  be  tailored  for  use  in  rule-based  systems  that  solve  particular  tasks. 


.2  Cognitive  Modeling 

One  ol  the  earliest  uses  of  rule-based  systems  was  to  the  design  of  programs  that  were 
intended  to  model  various  aspects  of  human  cognitive  performance.  Such  models  have 
turne  out  to  e  important,  both  in  the  study  of  cognitive  psychology  and  in  the  desigr 
of  programs  la  require  sop  risticated  interfaces  with  people.  For  example,  rule-basec 
cognitive  mi  e  s  nivc  cen  used  as  the  basis  for  the  design  of  tutoring  systems  Q  tha 
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cun  form  guesses  about  the  misconceptions  of  students.  Then  they  tailor  their  instruc¬ 
tion  to  the  specific  goal  of  repairing  those  misconceptions.  Another  important  applica¬ 
tion  has  been  to  the  design  of  agents  in  interactive  video  games  □.  Here  the  goal  is  to 
construct  agents  whose  behavior  seems  human  to  the  game's  human  players. 

Research  on  general  cognitive  architectures  has  led  to  the  development  of  several 
software  platforms  that  are  used  to  create  systems  that  model  human  intelligent  be¬ 
havior.  Two  of  the  most  comprehensive  and  influential  such  architectures  are  SOAR  Q 
(described  in  (Newell  1990)  and  [Laird.  Newell  and  Rosenhloom  1987])  and  ACT-V  Q 
(described  in  (Anderson  and  Lebierel998]  and  (Anderson  et  al.  2004)). At  the  heart  of 
these  svstems  is  the  principle  that  intelligent  action  arises  from  perceiving  the  environ¬ 
ment  and  then  responding  to  what  has  been  observed.  So  these  architectures  exploit  a 
production-rule  system  to  model  the  way  in  which  such  responses  are  constructed. 
When  left-hand  sides  of  rules  match  against  working  memory  (which  may  organized 
into  submemot  ies>.  such  as  a  long  and  a  short-term  memory),  right-hand  sides  describe 
appropriate  conclusions  and  actions. 

M.3.3  Expert  Systems 

The  term  “expert  system”  is  generally  used  to  describe  a  program  that  performs  a  task 
that  is  more  traditionally  performed  by  a  human  expert. This  contrasts  with  other  kinds 
of  Al  programs,  for  example  image  understanders  and  common  sense  reasoners,  that 
do  things  that  even  children  can  do  well.  Expert  systems  exploit  many  techniques  for 
representing  task  domain  knowledge  and  for  reasoning  with  it.  Bui  one  of  the  most  im¬ 
portant  techniques  is  rule-based  systems.  It  turns  out  that,  for  many  kinds  of  problems, 
the  wisdom  that  experts  have  accumulated  can  be  captured  in  a  set  of  pattern/action 
rules  of  the  sort  we  showed  in  Examples  M.4.  M.5  and  M.6. 

Rule-based  expert  systems  have  been  used  to  solve  real  problems  in  domains  Q  as 
varied  as: 

•  airplane  maintenance, 

•  quality  control  in  manufacturing, 

•  insurance  underwriting, 

•  clinical  decision  support, 

•  identifying  archeological  artifacts 

•  pest  control  in  agriculture,  and 

•  education. 

Commercial  tools  3  support  the  construction  of  such  systems  by  providing  a  domain- 
independent  inference  engine,  as  well  as  support  for  eliciting  rules  from  human  experts 
and  for  building  interlaces  between  the  rule  base  and  other  data  sources. 

M.3.4  Rule-Based  Systems  for  Modeling  Business  Practices 

Every  successful,  complex  organization  operates  (most  of  the  lime)  systematically  So 
there  exists  a  set  of  rules,  whether  they  are  written  down  or  not.  that  describe  what  the 
organization  does  and  how  it  does  it. 
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EXAMPLE  M.7  Some  Simple  Business  Rules 

A  human  resources  rule  set  might  include  the  following: 

If:  an  employee  has  been  employed  fewer  than  two  years. 

Then:  vacation  time  per  year  is  two  weeks. 

If:  an  employee  has  been  employed  two  or  more  years, 

Then:  vacation  time  per  year  is  three  weeks. 

If:  an  employee  fails  to  show  up  for  work  for  three  consecutive 

days  without  calling  in. 

Then:  (s)he  can  be  fired. 

A  sales  rule  set  might  include  the  following: 

If:  a  customer  is  a  repeat  customer, 

Then:  do  not  do  a  credit  check. 

If:  a  customer  orders  more  than  $1000  worth  of  merchandise  in  a 

single  order. 

Then:  give  a  10%  discount. 

An  inventory-management  rule  set  might  include  the  following: 

If:  the  inventory  of  widgets  has  been  below  100  for  more  than 

two  days. 

Then:  notify  the  inventory  manager. 

If:  the  inventory  of  fradgets  goes  below  1000, 

Then:  notify  the  inventory  manager  immediately. 


Getting  employees  to  articulate  these  rules,  particularly  in  any  sort  of  formal  way, 
can  be  hard.  But.  for  a  variety  of  reasons,  including  regulatory  ones,  businesses  are  in¬ 
creasingly  attempting  to  do  just  that.  There  exists  a  growing  family  of  commercial  tools 
Q  to  help  them.  As  with  expert  system  tools,  these  business  practice  rules  engines  assist 
with  extracting  rules  from  employees,  checking  rule  sets  for  consistency,  and  applying 
rules  to  help  solve  problems. 
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Applications:  Art  and  Entertainment: 
Music  and  Games 


In  1%8,  the  album  Switched  on  Bach  o  hit  the  pop  charts  and  almost  overnight, 
at  least  in  the  United  States,  just  about  everyone  had  heard  of  electronic  music. 
But  the  use  of  machines  as  entertainers  substantially  precedes  the  modern  digi¬ 
tal  computer.  Some  may  have  occurred  as  early  as  the  second  century  B.C.  We  know 
that  Leonardo  da  Vinci  built  mechanical  musical  instruments  in  the  16,h  century.  By 
the  end  of  the  2(),h  century,  it  was  no  longer  possible  to  imagine  the  worlds  of  enter¬ 
tainment  and  the  arts  as  distinct  from  the  worlds  of  digital  media  and  computation. 
In  this  section,  well  briefly  sketch  some  of  the  ways  in  which  the  techniques  that 
have  been  presented  in  this  book  are  used  in  music  and  in  games.  In  0.2.1,  we’ll  men¬ 
tion  one  other  entertainment  application:  the  use  of  context-free  grammars  to  model 
ballroom  dances. 


N.1  Music 

It  is  natural  to  think  of  music  as  a  language.  So  it  should  not  be  surprising  that  many  of 
the  tools  that  we  have  described  can  be  used  to  model  various  styles  of  music  and  to 
help  create  them.  We  mention  a  few  of  them  here.  For  a  comprehensive  survey,  see 
[Roads  19%).  But  first  we'll  consider  a  short  digression  to  answer  the  question, 
”Whv?’'  What  good  docs  it  do  to  make  a  formal  model  of  a  style  of  music?  Roughly  the 
answer  has  three  parts  (suggested  by  [Roads  19X5J): 

•  Musicologists  strive  to  understand  the  nature  of  particular  musical  forms  and  styles.  If 
they  can  build  formal  models  of  their  analyses,  they  can  test  them.  And.  once  tested 
the  model(s)  can  be  used  to  help  determine,  for  example,  the  origins  of  works  whose 
composer  and/or  date  is  not  clear  from  the  historical  record. 

•  Composers  want  new  ways  to  create  new  music. 

•  Some  people  think  it  is  fun. 
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.1  Using  Markov  and  Hidden  Markov  Models 

Composers  have  been  using  Markov  models  since  well  before  the  advent  of  the  com¬ 
puter;  in  fact,  they’ve  been  using  them  since  before  Markov  described  them.  Perhaps 
the  most  famous  early  example  is  the  Musikalisches  Wiirfelspiel  (or  Musical  Dice 
Came)  published  in  1792  and  (almost  certainly  erroneously)  attributed  to  Mozart, 
While  the  origin  of  this  particular  game  is  unclear,  it  is  known  that  games  of  this  sort 
were  widely  popular  in  18,h  century  Europe. The  definition  of  a  dice-game  composition 
consists  of; 

•  a  numbered  set  of  short  musical  phrases,  and 

•  a  table  with  11  rows  and  some  number  (8  in  the  case  of  the  Musikalisches  Wiirfel- 
spiel)  of  columns.  Each  entry  in  the  table  contains  the  number  of  one  of  the  musical 
phrases. 

To  “compose"  a  piece  that  consists  of  k  phrases,  a  player  rolls  a  pair  of  dice  k  times. 
Each  roll  produces  a  number  between  2  and  12. The  player  uses  the  first  roll  to  choose 
a  row  in  the  table  and  then  selects  the  phrase  whose  number  appears  in  column  l.The 
second  roll  is  used  to  select  a  second  phrase  from  column  two,  and  so  forth.  You  can  try 
it  yourself  O. 

The  Musikalisches  Wiirfelspiel  is  a  0th  order  Markov  model  (since  it  uses  no  histo¬ 
ry  in  deciding  what  to  do  next).  The  probabilities  associated  with  each  choice  are 
simply  the  probabilities  of  rolling  each  of  the  numbers  between  2  and  12.  Thus  the 
computational  requirements  of  the  game  make  it  easy  to  play  by  hand  (although  it 
was  also  implemented  by  Lejaren  Hiller  and  John  Cage  in  the  program  HPSCHD  S 
(Hiller  1972].) 

Since  the  mid  1950's,  with  the  advent  of  digital  computers,  more  sophisticated 
Markov  models,  typically  of  higher  order,  have  been  possible.  Such  models  have  been 
used  to  create  music  in  a  wide  variety  of  genres  Q.  Models  have  been  trained  on  nurs¬ 
ery  tunes,  the  songs  of  Stephen  Foster,  hymn  tunes,  and  the  works  of  Mozart  and  Hay¬ 
den,  to  name  a  few. 

lo  build  a  Markov  model  of  a  particular  musical  style,  it  is  necessary  to  do  the 
following: 

1.  Collect  a  corpus  of  example  pieces  from  the  chosen  style. 

2.  Select  one  or  more  important  features  (such  as  pitch  or  note  duration)  and  en¬ 
code  each  piece  as  a  sequence  of  features. 

3.  liuild  the  model  by  defining  a  state  set  (corresponding  to  the  features  that 

were  selected  in  step  2)  and  training  it  using  the  technique  described  in 
Section  5.1 1.1.  6  M 


Once  one  or  more  models  have  been  constructed,  it/they  can  be  used  for  either  or 
both  of  the  following  tasks: 


Composing  of  new  pieces  in  a  selected  style:  To  do  this,  simply  run  the  model  that 
correspon  s  to  t  e  esired  style,  allowing  it  to  generate  notes  according  to  the 
probabilities  that  it  acquired  during  the  training  process. 
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•  Composer  or  style  identification:  Given  two  or  more  models,  each  corresponding  to 
a  composer  or  style,  a  piece  whose  origin  is  uncertain  can  he  analyzed  to  see  which 
model  is  most  likely  to  have  generated  it. 

The  first  piece  of  music  composed  hy  a  digital  computer  is  the  llliac  Suite  for 
String  Quartet  O,  written  in  1956  hy  a  program  created  by  Lejaren  Hiller  and 
Leonard  Isaacson.  The  four  movements  of  the  Suite  correspond  to  four  composition 
experiments  conducted  by  Hiller  and  Isaacson.  The  first  three  movements  are  based 
on  traditional  compositional  techniques,  implemented  on  a  machine.  But  the  fourth 
movement  is  completely  different.  It  was  created  by  a  succession  of  more  and  more 
powerful  Markov  models,  the  last  of  which  was  4,h  order.  Once  the  piece  was  written, 
it  was  transcribed  and  played  by  musicians  on  conventional  instruments.  The  llliac 
Suite  had  a  huge  impact  on  computer  music,  not  just  for  what  it  was  but  also  because 
its  creators  chose  to  document  the  techniques  they  used  in  substantial  detail  [Hiller 
and  Isaacson  1959). 

In  creating  the  llliac  Suite.  Hiller  and  Isaacson  used  Markov  models  to  generate 
notes  but  not  to  generate  sounds.  It  was  soon  obvious  that  that  could  be  done  too. So  in 
1963.  Hiller,  along  with  Robert  Baker,  used  a  computer  to  write  Computer  Cantata 
[Hiller  and  Baker  1964],  Again  Markov  models  of  up  to4lh  order  were  used. The  mod¬ 
els  that  generated  the  musical  elements  were  trained  on  Charles  Ives's  Three  Plates  in 
New  England.  But  the  Cantata  also  exploited  a  sequence  of  Markov  models  that  had 
been  trained  on  English  sentences.  They  were  used  to  generate  "singing.”  The  4,h  order 
model  was  able  to  create  some  sounds  that  seemed  English-like. 

Hiller  and  Baker  used  the  creation  of  Computer  Cantata  as  a  tested  for  a  more  general, 
computer-composition  tool  that  they  were  building.  That  tool.  MUSICOMP.  became  the 
first  in  a  long  siring  of  tools  that  enable  composers  to  create  electronic  music. 

The  stochastic  techniques  that  Hiller  pioneered  have  been  used,  particularly  for 
electronic  sound  generation,  by  many  composers  in  the  years  since  the  first  appearance 
of  the  llliac  Stole  and  the  Computer  Cantata.  Good  introductions  to  them  appear  in 
computer  music  textbooks,  for  example  [Moore  199(l|.'rhese  techniques  are  .appealing 
to  some  composers  because  they  “may  also  produce  unanticipated  possibilities,  where 
the  bonds  of  a  restrictive  and  inaccurate  acoustic  theory  and  of  a  limited  aural  imagina¬ 
tion  may  be  broken,”  [Jones  1981 J. 

As  with  any  computational  problem,  a  key  issue  in  using  Markov  models  to  describe 
music  is  representation.  A  piece  of  music  can  naturally  be  described  as  a  sequence  of 
events.  So  one  way  to  represent  it  as  a  Markov  model  is  to  build  a  single  model  whose 
states  correspond  to  atomic  events  (at  some  level  of  granularity).  But  musical  events 
are  complex.  For  example,  a  single  note  has  pilch  (which  may  be  described,  for  exam¬ 
ple.  as  a  range  and  an  average),  length,  intensity,  a  concluding  period  of  silence,  and  a 
harmonic  envelope.  So  another  idea  is  to  describe  each  musical  event  as  a  vector  of  pa¬ 
rameters.  each  of  which  describes  some  aspect  of  the  note.  Then  a  set  of  Markov  mod¬ 
els  can  be  used  to  control,  separately,  the  vector  elements.  An  example  of  the  use  of  this 
technique  the  piece  Macricisunt.by  Kevin  Jones  | Jones  1981  J.  It  was  created  using  a  set 
of  nine  Markov  models. 

But  now  suppose  that  we  wish  to  train  our  models  on  music  that  is  a  noisy  rendition 
of  the  music  that  was  composed.  For  example,  suppose  that,  instead  of  reading  a  score 
we  are  listening  to  a  singer  sing.  Now  the  probability  that  we  will  hear  a  particular  note 
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is  a  function  of  both  the  probability  that  that  note  was  intended  by  the  composer  and 
the  probability  that  the  singer  rendered  the  written  note  in  a  particular  way.  In  this  ease, 
we  can  use  a  hidden  Markov  model  (HMM).  The  states,  and  the  transitions  between 
them,  will  describe  the  music  as  it  is  written.  The  output  probabilities  will  describe  the 
likelihood  of  each  written  note  being  performed  in  a  particular  way.  HMMs  of  this  sort 
can  be  used  in  a  variety  of  applications  P.  For  example,  suppose  that  we  have  a  database 
of  compositions/songs,  each  represented  as  an  HMM. Then  we  can  build  a  retrieval  sys¬ 
tem  that  allows  a  user  to  hum  a  song. The  system  can  then  identify  and  retrieve  the  song 
by  using  the  forward  algorithm  (as  described  in  Section  5.1 1.2)  to  choose  the  HMM  that 
is  most  likely  to  correspond  to  the  hummed  tune. 

HMMs  can  also  be  used  in  composition  H.  Suppose  that  we  are  given  one  musi¬ 
cal  line  (say  a  melody  or  a  base  line)  and  the  task  is  to  generate  other  lines  in  either 
harmony  or  counterpoint  with  it.  We  build  an  HMM  whose  slates  correspond  to  re¬ 
lationships  between  a  given  note  and  the  one  that  we  propose  to  generate  to  coin¬ 
cide  with  it.  The  probabilities  associated  with  the  transitions  correspond  to  the 
likelihood  of  one  such  relationship  following  another.  We  train  the  network  on  a 
corpus  of  harmonized  works  drawn  from  the  style  we  want  to  emulate.  We  then  use 
the  Vitcrbi  algorithm  to  find  the  most  likely  path  through  the  network  given  a  par¬ 
ticular  input  line. Thai  path  will  define  an  output  line  that  can  be  played  along  with 
the  original  one. 


1.2  Using  Grammars  and  Rule-Based  Systems 

While  purely  stochastic  systems  have  proven  to  be  interesting,  they  fail  to  capture 
everything  about  the  organization  of  interesting  music: 

It  is  mostly  agreed  that  musical  structures  are  hierarchical.  This  hierarchical  or¬ 
ganization  implies  that  the  various  parts  of  a  piece  of  music  belong  together  so 
that  small  parts  are  joined,  such  that  they  form  greater  parts,  which  are  joined 
such  that  they  form  greater  parts,  which  are  joined  such  that  they  form  greater 
parts,  and  so  on. These  structural  units  are  called  constituents.  With  respect  to  the 
hierarchical  constituent  structure,  there  is  a  clear  similarity  between  music  and 
language.  What  kind  of  descriptions  are  appropriate  for  such  structures?  [Sund- 
bergand  Lindblom  1991,  pp.  245-246], 


Grammars  and,  more  generally,  rule-based  systems,  can  be  effective  tools  for  associ¬ 
ating  structures  with  strings.  Soon  after  the  development  of  formal  linguistic  grammars 
in  the  1950s,  musicians  and  musicologists  began  exploiting  grammatical  formalisms, 
both  to  analyze  existing  music  and  to  compose  new  pieces.  The  question  then  arose. 
“What  kind  of  grammar  best  suits  the  task?”  Following  the  Rule  of  Least  Power  (see 
Section  3.3.4),  it  makes  sense  to  choose  the  weakest  adequate  formalism.  Regular 
grammars  can  assign  only  simple  structures  in  which  one  terminal  symbol  is  generated 
at  each  branch  of  a  parse  tree.  So  they  were  never  seriously  considered  for  this  task. 

(  onlext-free  grammars  have  been  considered.  But  it  turns  out  that  they  cannot  de¬ 
scribe  even  very  straightforward  musical  structures.  We'll  present  one  simple  example, 
taken  from  [  est,  owe  1  and  Cross  1991  ].  Sonata  (or,  sometimes,  sonata  allegro)  form 
was  common  y  use  ,  starting  in  the  late  I8lh  century,  to  define  the  structure  of  the  first 
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and  sometimes  also  the  last  movement  of  a  symphony,  a  sonata,  or  a  concerto.  A  simple 
attempt  to  describe  sonata  form  with  a  context-free  grammar  might  begin  like  this: 

<mtnvnwnt>  — *  < purl \>< par ii> 

< part\>  —*  < sve t ion \  > < sect ion^> <  sect  iotiyX  section > * 

<parh>— *<sectionl><seci  ion>*<  section  i><\ectiun>><.uTtion)><section>* 

But  we  immediately  have  two  problems,  both  of  which  are  fundamental  to  the  use 
of  context-free  grammars: 

•  Both  occurrences  of  section (  in  part 2  must  be  based  on  the  same  theme  as  section 1 
in  part\.  Similarly  for  the  two  fragments  labeled  section >.  So.  ignoring  the  musical 
variations  that  are  allowed,  wc  have  the  same  problem  that  we  had  in  trying  to  write 
a  context-free  grammar  for  WW  =  { mv :  ic  e  { a.  b }  * }  and  for  describing  the  type 
checking  constraints  imposed  by  programming  languages  like  Java. 

•  There  are  what  linguists  call  suprasegmcntal  properties  ( like  intonation)  that  are  not 
captured  by  the  breakdown  of  the  piece  into  sections.  Puri \  starts  in  a  home  key. 
There  is  then  movement  away  from  the  home  key  through  section 1  and  section 3.  In 
parti,  the  motion  is  in  the  other  direction,  back  to  the  home  key  by  the  end  of  the 
second  occurrence  of  section  | .  These  larger  properties  cannot  be  described  by  a  con¬ 
text-free  grammar  that  must  generate  each  subconstituent  independently  of  all  the 
others. 

As  a  result,  most  grammars  of  music  exploit  more  powerful  formalisms,  primarily 
ones  that  are  equivalent  in  power  to  unrestricted  grammars  and  to  Turing  machines. 

When  rules  can  contain  multiple  symbols  in  their  left-hand  sides,  contextual  con¬ 
straints  can  be  described.  We  ll  mention  one  example,  taken  from  a  much  larger  body 
of  work  whose  goal  is  to  describe,  for  Western  musicians,  nonweslern  musical  tradi¬ 
tions.  Formal  systems  have  been  appealing  in  this  arena  since  people  arc  working  in 
traditions  with  which  they  are  not  familiar.lt  thus  becomes  important  to  construct  the¬ 
ories  that  can  be  validated  without  appeal  to  intuitions  that  arc  probably  not  well 
formed.  (Kippen  and  Bel  1W2|  presents  a  grammar  that  describes  the  rules  by  which 
improvisation  is  done  in  North  Indian  tabla  drumming.  One  of  the  constituents  in  the 
grammar  is  named  V{.  It  can  be  realized  in  one  of  three  ways,  named  cilia,  li,  and  -.  It 
turns  out  that  the  way  it  is  realized  is  dependent  on  its  context.  So  the  grammar  con¬ 
tains  rules  such  as  the  ones  shown  in  Figure  N.l. 

Kippen  and  Bel  call  these  rules  ‘•context-sensitive,”  as  indeed  they  are.  But  the  over¬ 
all  grammar  that  they  describe  is  an  unrestricted  grammar  in  our  sense,  not  a  context- 
sensitive  one.  since  length-reducing  rules  are  employed. 


contexts  that  generate  li:  contexts  that  generate  illiu:  contexts  that  generate  -: 

ilhu  V,  tlhu  -»  tllw  U  illtn  kl  V,  /r  -t  ki  illiu  tr  ,ih„  y,  tr  -*  tiha  -  tr 

,llw  Vt  V\  -» tllui  li  t',  kl  V,  V,  ->  kl  till, 1  ,U, a  t',  -*  ,U,a  .  aha 

etc.  etc.  clc 

FIGURE  N.l  Context-dependent  labia  rules. 
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The  language  of  music  appears  to  share  many  structural  features  with  natural  lan¬ 
guages  and  many  of  those  features  are  hard  to  describe  without  substantial  formal 
power.  Linguists  have,  over  the  years,  defined  grammatical  frameworks  that  have  the 
formal  expressive  power  of  unrestricted  grammars  but  that  offer  particular  kinds  of 
representational  advantages  for  describing  linguistic  data.  Musicologists  have  adopted 
many  of  those  formalisms  and  put  them  to  their  own  use. 

Starting  in  the  mid  1960s,  the  transformational  grammar  formalism  described  in 
[Chomsky  1965]  became  popular. The  key  idea  in  transformational  grammar  is  that 
a  sentence  has  a  deep  structure  that  corresponds  naturally  to  its  meaning.  It  also  has 
a  surface  structure  (namely  what  we  hear  or  read).  While  these  two  are  not  the 
same,  there  is  a  systematic  relationship  between  them.  A  transformational  grammar, 
then,  has  two  parts:  a  set  of  context-free  rules  that  define  the  deep  structure  of  a 
sentence  and  another  set  of  transformational  rules  that  describe  how  the  deep  struc¬ 
ture  is  to  be  realized  as  a  linguistic  form.  It  is  the  existence  of  the  transformational 
component  that  gives  these  grammars  the  expressive  power  of  an  unrestricted 
grammar.  Transformational  rules  can  be  used  to  do  many  kinds  of  things.  For  in¬ 
stance.  they  can  move  constituents  around.  A  simple  example  from  English  illus¬ 
trates  this  idea.  Consider  the  sentence, “What  did  Mary  see  in  the  park?" The  object 
of  the  verb  “see”  is  the  interrogative  pronoun  “what.”  It  got  moved  from  its  natural 
place  alter  “see"  to  the  sentence-initial  position  in  which  interrogative  pronouns 
occur  in  English. 

Transformations  that  describe  the  relationship  between  an  underlying  structure 
and  a  surface  form  appear  to  be  a  natural  way  to  describe  many  aspects  of  the  struc¬ 
ture  of  many  kinds  of  music.  This  notion  was  articulated  by  Heinrich  Schenker  S3  (see, 
for  example,  | Schenker  1935])  well  before  it  became  possible  to  write  formal  gram¬ 
mars  that  could  be  run  on  computers.  So  the  transformational  model  has  been  used  to 
describe  many  genres  of  music.  For  example,  [Baroni  et  al  1984]  describes  work  on 
grammars  for  the  melodies  of  Lutheran  hymn  tunes,  the  melodies  of  French  chansons, 
and  Bach’s  chorale  harmonies.  [Camilleri  1984]  describes  a  similar  effort  to  construct 
a  grammar  for  the  initial  phrases  of  Schubert's  Lieder.  Other  work  has  modeled  folks 
tunes  and  jazz. 

Early  efforts  at  defining  grammars  of  music  were  conducted  by  hand.  People  exam¬ 
ined  music,  attempted  to  extract  recurrent  patterns,  and  then  wrote  grammars  that  de¬ 
scribed  those  patterns.  But  if  the  goal  is  to  construct  a  grammar  that  accurately 
describes  the  patterns  that  characterize  an  existing  corpus  of  work,  then  it  makes 
sense  to  use  as  large  a  corpus  as  possible  and  to  extract  the  grammar  rules  automati¬ 
cally-  Once  the  grammar  has  been  written,  it  can  be  used  as  the  basis  for  an  analysis 
of  the  existing  corpus  or  as  a  way  to  generate  new  works.  Many  genres  of  classical 
Western  tonal  music  have  been  analyzed  in  this  way.  Twentieth  century  jazz  has  been 
extensively  studied  in  order  to  extract  rules  that  describe  successful  improvisation 
strategics.  One  of  the  most  widely  used  musical  pattern  extractors  and  composers  is 
David  Cope  s  EMI  system  [Cope  1996].  It  has  been  used  to  write  music  in  the  styles 
of  Bach.  Beethoven,  Mozart,  Stravinsky,  Gershwin,  Joplin,  and  Cope  himself.  EMI 
represents  its  grammars  in  a  formalism  called  an  augmented  transition  network  (or 
Al N).  An  AIN  is  essentially  a  finite  state  machine  except  that  arbitrary  computa- 
lional  actions,  including  recursive  calls  to  the  state  machine  itself,  may  be  performed 
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as  tests  on  arty  of  the  arcs  that  connect  one  stale  to  another.  ATNs  have  the  same 
computational  power  as  do  Turing  machines. 

Rule-based  systems  that  do  not  behave  exactly  as  grammars  do  have  also  been  used 
both  to  analyze  and  to  generate  music. This  approach  is  appealing  when  dealing  with  mu¬ 
sical  genres  that  appear  to  be  describable  by  some  fixed  set  of  rules.  Some  kinds  of  early 
Western  music,  for  example,  have  that  property.  Sixteenth  century  counterpoint  is  a  good 
case  study.  Fux's  Gr adits  ad  Parnassian,  published  in  1725.  offered  a  set  of  rules  and 
guidelines  that  characterize  species  counterpoint  |Fux  1725|.  [Schottstaedl  1989]  de¬ 
scribes  a  program  for  generating  counterpoint  according  to  the  rules  that  Fux  laid  out. 
Associated  with  each  rule  is  a  penalty  that  should  attach  to  any  composition  each  time  it 
breaks  the  rule.  So.  for  example,  Schottstaedl  lists  the  following  rules  and  their  penalties: 

Parallel  fifths  are  not  allowed.  (Infinite  penalty.) 

Avoid  direct  motion  to  a  perfect  fifth.  (Penalty  is  2(H).) 

Avoid  unisons  in  two-part  counterpoint.  (Penalty  is  1(H).) 

Avoid  tritones  near  the  cadence  in  Lydian  mode.  (Penally  is  13.) 

Schottstaedt’s  program  composes  counterpoint  by  running  a  best-first  search 
process  through  a  space  of  possible  compositions.  It  uses  the  penalties  as  its  heuristic 
function  for  evaluating  competing  notes.  Another  example  of  a  rule-based  system  that 
composes  in  a  well-understood  style  is  CHORAL  [Ebcioglu  1992].  which  creates  har¬ 
monies  to  accompany  the  melodies  of  Bach  chorales. 

One  way  to  represent  production  rules  is  as  a  Prolog  program  (as  described  in 
M.2.3).  For  a  description  of  this  approach  to  describing  music,  see  [Schaffer  and 
McGee  1997].  A  simple  example  shown  there  is  the  following  rule,  which  encodes  the 
definition  of  a  passing  tone: 

nonharmoni c_tone (passi ng_tone)  ; - 
approached(step) , 
resolved(step) , 
regi stral_di rection(same) . 

Other  approaches  to  writing  musical  grammars  have  also  been  exploited.  For  exam¬ 
ple.  L-systems  c  can  he  used  to  model  a  process  in  which  basic  musical  characteristics, 
like  pitch  and  duration,  are  encoded  as  symbols  and  then  transformed  into  music. 


N.2  Classic  Games  and  Puzzles 

By  the  late  1940s.  the  idea  of  a  stored-program.  electronic  computer  had  been  born  and 
development  efforts  were  well  underway.  By  then.  also.  Alan  Turing  and  others  had 
begun  talking  about  computer  chess.  At  the  start  of  the  1950\  Claude  Shannon  some 
times  called  the  father  of  information  theory,  published  the  first  paper  [Shannon  19501 
on  computer  chess.  By  the  mid  1950s,  there  were  several  programs  that  played  at  least 
partial  games  of  chess.  A  few  years  later,  Allen  Newell  and  Herbert  Simon,  pioneere  in  the  then 
new  field  of  artificial  intelligence,  wrote, “Within  ten  years  a  computer  will  be  the  world’s 
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chess  champion,  unless  ihe  rules  bar  it  from  competition,"  [Simon  and  Newell 
1958).  The  rules  imposed  no  such  bar.  But  a  computer  did  not  beat  the  human  world 
chess  champion  until  1997.  when  Deep  Blue  H  beat  Garry  Kasparov.  Why  did  it 
take  so  long? 

The  theory  of  complexity  that  we  describe  in  Part  V  does  not  apply  directly  to 
chess.  Nor  docs  it  apply  to  Go.  Sudoku,  or  most  of  the  other  classic  games  and  puz¬ 
zles  with  which  we  are  familiar.  It  doesn't  apply  because  most  of  those  games  have 
fixed-size  boards  and  the  theory  we  have  built  describes  complexity  as  a  function  of 
increasing  problem  size.  But  chess  and  its  cousins  are  hard  for  the  same  reason  that 
the  sort  of  combinatorial  problems  to  which  our  theory  does  apply  are  hard:  There 
does  not  appear  to  exist  an  algorithm  for  solving  the  problem  without  searching  a 
large  space  of  individual  moves.  For  example,  in  the  middle  of  a  chess  game,  the 
branching  factor  (the  number  of  alternative  moves)  is  about  35.  It  is  typically  esti¬ 
mated  that  playing  master-level  chess  requires  looking  ahead  about  eight  moves.  So 
choosing  a  move  using  lookahead  requires  examining  about  358  «  2  •  1012  moves.  For 
the  game  of  Go  Q,  the  situation  is  even  worse. The  branching  factor  is  greater  (over 
300  at  some  points  in  the  game)  and  it  is  substantially  harder  to  write  an  evaluation 
function  that  can  examine  an  intermediate  board  position  and  determine  how  good  it 
is.  While  tables  of  opening  and  closing  moves  can  help  reduce  search  (at  least  in  the 
case  of  chess),  no  one  has  yet  found  a  way  to  avoid  search  in  the  general  case  of  these 
classic  games. 

Complexity  theory  can  help  to  explain  the  observed  difficulty  of  writing  programs 
to  play  these  games  if  we  generalize  the  games  to  boards  of  arbitrary  size. This  isn’t  a 
totally  olf-the-wall  idea.  For  example,  one  early  (1956)  “chess”-playing  program  used 
just  a  6  x  6  board  in  order  to  make  the  problem  tractable.  Using  this  idea,  we  have  de¬ 
fined,  for  example,  the  language  that  contains  descriptions  of  generalized  chess  config¬ 
urations  from  which  the  current  player  has  a  guaranteed  win. 

•  CHESS  =  {</)>:  b  is  a  configuration  of  an  n  x  n  chess  board  and  there  is  a 
guaranteed  win  for  the  current  player} 


.1  Nim 

In  Example  21 .5.  we  considered  the  game  of  Nim.  We  can  describe  Nim  as  a  language 
as  follows. 

•  NIM  —  f  </>>:  h  is  a  Nim  configuration  (i.e..  a  set  of  piles  of  slicks)  and  there  is  a 
guaranteed  win  for  the  current  player} 

Recall  that,  in  Example  21.5.  we  described  a  straightforward  algorithm  for  deciding, 
in  any  game  of  Nim,  whether  the  current  player  has  a  move  that  leads  to  a  guaranteed 
win.  Gur  algorithm  doesn  t  require  actually  searching  the  space  of  possible  move  se¬ 
quences.  It  proves  that  NIM  is  in  P  (and.  in  fact,  in  L). 

People  get  bored  pretty  quickly  with  Nim  (particularly  once  they  know  the  trick).  It 
appears  that,  for  games  and  puzzles,  greater  complexity  is  a  desirable  feature. 
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N.2.2  NP-Complete  Puzzles 

The  generalizations  of  many  popular  puzzles  have  been  shown  to  be  NP-complete. 
We'll  mention  a  few  of  them  here. 

Sudoku 

Sudoku  a  is  typically  played  on  a  9  x  9  grid.  The  goal  is  to  complete  a  partially  filled- 
in  grid,  such  as  the  one  shown  in  Figure  N.2.The  rules  of  the  game  require  that,  in  any 
solution,  each  of  the  digits  I  through  9  must  occur  exactly  once  in  each  row.  in  each  col¬ 
umn.  and  in  each  marked  3x3  subgrid. 

In  order  to  be  able  to  talk  about  the  complexity  of  a  Sudoku  problem  as  a  function 
of  the  size  of  its  input,  we  generalize  the  standard  Sudoku  game  to  an  n  x  n  grid, 
where  n  is  a  perfect  square.  We  then  require  that  each  of  the  numbers  from  1  to/i  occur 
in  each  row.  in  each  column,  and  in  each  of  the  n  subgrids  divided  as  above.  Then,  to 
turn  the  problem  into  a  decision  problem,  we  restate  it  as  follows. 

•  SUDOKU  =  {<h>:h  is  a  configuration  of  an  n  x  n  grid  and  b  has  a  solution 
under  the  rules  of  Sudoku } 

The  problem  becomes  one  of  deciding  whether  a  solution  exists.  Clearly  completing 
the  grid  can  be  no  easier  than  deciding  whether  such  a  completion  is  possible. 

Sudoku  is  typical  of  a  large  class  of  one-person  games  or  puzzles.  It  can  be  solved  by 
a  straightforward  search  process  in  which  it  suffices  to  find  a  single  solution.  In  the  case 
of  Sudoku,  we  need  to  find  a  single  way  of  filling  in  the  grid  that  meets  the  require¬ 
ments  of  the  game.  Of  course,  experts  tend  to  do  very  little  actual  search  as  they  exploit 
a  variety  of  heuristics  that  prune  the  space  of  alternatives,  often  to  the  point  that  no 
search  is  required.  Those  heuristics  break  down,  however,  for  larger  versions  of  the 
puzzle.  Whether  or  not  heuristics  work  to  simplify  the  problem  some  of  the  time,  the 
search  approach  suggests  a  straightforward  way  to  show  that  SUDOKU  is  in  NP:  It  can 
be  verified  in  polynomial  time  by  a  deterministic  Turing  machine  that  considers  a  pro¬ 
posed  solution  and  simply  checks  that  all  the  constraints  are  satisfied. 

Sudoku  is  a  variant  of  an  older  puzzle  called  Latin  squares  Q,  which  considers  only 
the  row  and  column  (but  not  the  subgrid)  constraints. The  problem  of  deciding  whether 
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A  crossword  grid. 


an  instance  of  an  n  x  n  Latin  squares  puzzle  has  a  solution  is  NP-complete  [Colboum 
19X4].  SUDOKL  has  been  shown  [Yato  2002]  to  be  NP-complete  by  reduction  from 
the  Latin  squares  problem. 


Crossword  Puzzles 

Next  we  consider  the  problem  of  constructing  a  crossword  puzzle.  More  specifically, 
define: 

.  CROSSWORD-PUZZLE-CONSTRUCTION  =  {<a  finite  set  W  C  2+  of  words, 
an  n  x  n  matrix  6,  each  square  of  which  is  blank  or  black>:  it  is  possible  to  form  a 
valid  crossword  puzzle  by  filling  every  blank  square  of  B  with  a  symbol  in  2  in  such  a 
way  that  every  contiguous  siring  of  letters,  running  both  horizontally  and  vertically,  is 
a  word  in  W } . 

For  example,  if  n  is  3.  all  squares  of  B  are  blank,  and  the  list  of  words  is  age,  ago,  beg,  cab, 
cad,  dog,  then  it  is  possible  to  construct  the  grid  shown  in  Figure  N.3.  CROSSWORD- 
PUZZLE-CONSTRUCTION  is  NP-complete.  (See  [Garey  and  Johnson  1979].) 

Instant  Insanity 

The  last  example  that  we’ll  consider  is  the  Parker  Brothers®  puzzle  known  as  Instant 
lnsanityw  Q.The  puzzle  consists  of  a  set  of  four  plastic  cubes. There  is  a  set  of  four  col¬ 
ors  C,  and  each  of  the  six  sides  of  each  of  the  cubes  is  painted  one  of  the  colors  in  C.  A 
solution  to  the  puzzle  is  an  arrangement  of  the  cubes  into  a  single  column  in  such  a  way 
that,  on  each  of  the  four  sides  of  the  column,  each  color  in  C  appears  exactly  once.  We 
can  describe  the  generalized  puzzle  as  the  language: 

•  INSTAN  T-INSANITY  =  {<a  set  B  of  n  blocks  and  a  set  C  of  n  colors>  :  the 
blocks  in  B  can  be  stacked  in  a  single  vertical  column  and,  on  each  of  the  four  sides 
of  the  column,  each  color  in  C  appears  exactly  once}. 

INSTANT-INSANITY  is  NP-complete  [Garey  and  Johnson  1979], 

3  Two-Person  Games 

But  now  consider  two-person  games,  like  chess  and  Go  (and  checkers  and  backgammon 
and  so  forth).  Suppose  that  we  have  the  fragment  of  a  game  tree  shown  in  Figure  N.4.To 
make  it  easy  to  follow  the  discussion  that  we  are  about  to  present,  we’ll  assume  that  we 
always  evaluate  moves  from  a  single  perspective.  We’ll  pick  the  perspective  of  the  first 
player,  whom  we'll  call  player ,.  Let’s  say  that,  at  A,  it  is  players  turn.  Player^  will  con¬ 
sider  its  alternatives  and  attempt  to  find  the  best  one.  We’li  call  this  a  maximizing  step. 
But  then  somet  nng  diflerent  happens  at  the  next  step.  Player?  gets  to  choose  a  move. 
So  suppose  that  player ,  chooses  to  move  to  C.Then  player 2  will  choose  move  D  or  E  or 
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max  FIGURE  N.4  A  game  tree 
when  there  are  two  players,  8 
maximizer  and  a  minimizes 

one  of  their  alternatives. When  it  does  so,  it  will  choose  the  worst  move  (from  player j's 
perspective).  So  we’ll  call  this  a  minimizing  level.  Then  it's  player j’s  turn  again  and  it 
will  again  attempt  to  maximize,  and  so  forth. 

Now  suppose  that  player t  is  considering  a  proposed  move  sequence,  say  A,  C,  E,  G. 
Player (  can't  verify  that  it  can  win  by  choosing  to  go  from  A  to  C  just  by  examining 
that  sequence.  Maybe  A.  C,  E.  G  does  lead  to  a  win.  But  if  A.C,  D  leads  to  a  loss,  then 
player |  cannot  be  guaranteed  a  win  just  by  choosing  to  move  to  C.The  problem  is  that 
the  second  move  isn't  under  its  control. 

So  CHESS  is  not  obviously  in  NP  since  we  no  longer  have  a  simple  verifier  for  it  (of 
the  sort  that  we  have  for  SUDOKU). 

Another  way  to  think  about  the  reason  that  we  have  failed  to  show  CHESS  to  be 
in  NP  is  to  think  about  a  nondeterministic  decider  (as  opposed  to  a  deterministic  ver¬ 
ifier),  Recall  how  a  nondeterministic  Turing  machine  works.  It  accepts  a  string  iff 
there  is  any  path  that  accepts.  But  now  consider  the  problem  of  deciding  whether 
there  is  a  guaranteed  win  for  player t.  When  it  is  player's  turn,  it  suffices  to  find  any 
move  that  is  guaranteed  to  be  a  win  for  it.  But.  when  it  is  player {s  turn,  it  is  necessary 
to  show  that  every  move  that  playen  might  choose  guarantees  the  win  we  seek  for 
player\.  So  we  can’t  solve  the  CHESS  problem  by  (nondelerministically)  finding  a 
single  path  that  leads  to  a  win. 

There  arc.  however,  two  ways  we  can  solve  it.  The  first  is  to  conduct. deterministically, 
a  depth-first  search  of  the  tree  of  possible  moves.  Each  branch  ends  whenever  one  side 
wins  or  a  draw  is  declared.  As  the  search  backs  up.  it  can  compute  win/lose/draw  values 
for  each  intermediate  node  once  the  values  for  all  of  its  daughter  nodes  are  known.  At 
each  maximizing  step,  the  win/lose/draw  value  is  the  maximum  of  the  values  of  the 
daughter  nodes.  At  each  minimizing  step,  it  is  the  minimum  of  those  values.  If  the  start¬ 
ing  node  gels  assigned  the  win  value,  then  it  corresponds  to  a  guaranteed  win  for 
players  and  die  algorithm  will  accept.  Otherwise,  it  will  reject. 

Let's  analyze  the  space  complexity  of  this  approach. The  search  requires  (P(n2)  space 
to  store  one  board  configuration.  The  total  amount  of  space  required  to  conduct  the 
search  depends  on  the  number  of  moves  that  must  be  considered  (and  thus  the  depth  of 
the  search  stack).  Let  the  maximum  number  of  moves  be  given  by  some  function /(n) 
Then  conducting  the  depth-first  search  can  be  done  in  0(/(;j)*n2)  space.  So  what  is 
/(/i)?  If  the  search  process  ever  generates  a  board  configuration  that  duplicates  one  that 
appeared  earlier  in  the  same  path,  the  path  can  quit.  Its  options  are  the  same  the  second 
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time  as  they  were  the  first  time.  So  the  length  of  any  search  path  is  bounded  by  the  num¬ 
ber  of  board  configurations  it  may  encounter.  How  many  such  configurations  are  there? 
That  depends  on  exactly  what  we  mean  when  we  say  that  we  generalize  the  game  to  an 
n  X  n  board.  If  we  just  add  board  squares  but  do  not  add  any  pieces,  then  f\n)  is  a  poly¬ 
nomial.  If  wc  take  that  definition  for  CHESS,  then  it  is  in  PSPACE.  And.  in  that  case,  it 
can  be  shown  to  be  PSPACE-complete.  If,  on  the  other  hand,  we  make  the  perhaps 
more  reasonable  assumption  that  the  number  of  pieces  grows  with  n,  then  j[n)  grows 
exponentially  with  n  and  CHESS  appears  not  to  be  in  PSPACE.  It  is,  in  that  case,  EXP- 
TIME-complete  [Fraenkel  and  Lichtenstein  1981]. 

An  analysis  similar  to  the  one  that  we  just  did  for  CHESS  can  also  be  applied  to  the 
game  of  Go  a.  First  we  must  generalize  the  standard  game  to  a  board  of  arbitrary  size. 
Doing  that,  we  can  define  the  language: 

•  GO  =  { <b>:  h  is  a  configuration  of  an  n  X  n  Go  board  and  there  is  a  guaranteed 

win  for  the  current  player}. 

GO  is  PSPACE-hard  {Lichtenstein  and  Sipser  1980],  so  no  efficient  algorithm  for  it 
is  likely  to  exist.  Saying  anything  more  precise  about  the  computational  complexity  of 
GO  is  complicated  by  the  fact  that  the  rules  of  the  game  vary  and  the  details  of  the 
rules  appear  to  affect  GO’s  complexity  class.  For  example,  using  Japanese  rules  and  the 
simple  “ko”  rule  (which  makes  it  illegal  to  make  a  move  that  causes  the  board  configu¬ 
ration  to  return  to  its  immediately  preceding  configuration),  GO  is  EXPTIME-com- 
plete  {Robson  1983].  Using  some  other  rule  sets,  the  complexity  class  of  GO  remains 
an  open  question. 

It  is  worth  noting,  however,  that  while  CHESS  and  GO  do  not  appear  to  be  in 
PSPACE,  there  are  two-person  games  that  are.  They  are  games  for  which  it  is  possible 
to  place  a  polynomial  bound  on  the  number  of  moves  that  can  occur  in  one  game.  So, 
for  example.  Amazons,  Hex,  and  Othello  are  all  PSPACE-complete  S. 


.4  Alternating  Turing  Machines 

An  alternative  approach  to  analyzing  languages  like  CHESS  is  suggested  by  the  obser¬ 
vation  that,  at  alternating  levels  of  a  game  search  tree,  we  need  to  ask.  “Is  there  at  least 
one  winning  path  from  here?”  and  then, “Are  all  paths  from  here  winning?”  Define  an 
alternating  Turing  machine  to  be  a  nondeterministic  Turing  machine  with  one  differ¬ 
ence:  Whenever  a  nondeterministic  choice  is  made,  the  machine  specifies  the  condition 
under  which  it  will  accept.  It  may  choose  to: 

•  accept  whenever  at  least  one  daughter  path  accepts,  or 

•  accept  only  if  all  daughter  paths  accept. 

Note  that  it  can  make  different  choices  at  different  branch  points.  We  can  easily  build 
an  alternating  Turing  machine  to  decide  CHESS.  At  maximizing  levels,  it  suffices  to  find 
one  accepting  path.  At  minimizing  levels,  all  paths  must  win. 

What  is  the  complexity  of  the  alternating  Turing  machine  that  decides  CHESS?  To 
answer  that  question,  we  begin  by  defining  a  set  of  complexity  classes  for  alternating 
Turing  machines:  ' 
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•  AP  (alternating  polynomial  lime):  For  any  language  L.Le  AP  ill  there  exists  some 
alternating  Turing  machine  M  that  decides  L  in  polynomial  time. 

•  APSPACE  (alternating  polynomial  space):  For  any  language  L.  L  e  APSPACE  iff 
there  exists  some  alternating  Turing  machine  M  that  decides  L  in  polynomial  space. 

•  AL  (alternating  logarithmic  space):  For  any  language  L,  L  e  AL  ifl  there  exists 
some  alternating  Turing  machine  M  that  decides  /.  in  logarithmic  space. 

A  significant  result  is  that  alternation  buys  one  complexity  class  | Chandra,  Kozen 
and  Stockmeyer  19N1).  More  specifically: 

•  AL  =  P:  Alternating  logarithmic  space  is  exactly  as  powerful  as  deterministic  poly¬ 
nomial  time. 

•  AP  =  PSPACE:  Alternating  polynomial  time  is  exactly  as  powerful  as  polynomial 
space. 

•  APSPACE  =  EXPTIME:  Alternating  polynomial  space  is  exactly  as  powerful  as 
exponential  lime. 

Without  assuming  a  polynomial  bound  on  the  number  of  moves  that  may  have  to  be 
considered,  we  know  that  CHESS  is  EXPTIME-complete.  So.  given  the  result  we  just 
described,  we  can  conclude  that  it  is  also  APSPACE-complele.  Similarly,  since,  using 
Japanese  rules  and  the  simple  ko  rule. GO  is  EXPTIME-complete.  it  is  also  APSPACE- 
complete. 

AlternatingTuring  machines  are  more  useful,  however,  in  cases  where  the  complex¬ 
ity  of  a  problem  is  not  already  known.  When  the  alternating  Turing  machine  model  nat¬ 
urally  matches  the  structure  of  a  problem  (as  it  does  in  chess,  for  example),  it  may  be 
easier  to  determine  the  problem's  alternatingTuring  machine  complexity  than  it  would 
be  to  determine  its  complexity  with  respect  to  the  standard  model.  But  then,  using  the 
result  we  just  described,  its  standard  complexity  class  can  be  inferred. 

N.2.5  Game  Programs  that  Win:  The  Minimax  Search  Algorithm 

So  far,  we  have  been  considering  ways  of  constructing  an  exact  answer  to  the  question, 
"Can  player\  win?”  and  we've  seen  that  it  is  hard.  But  the  analysis  that  we  have  done 
suggests  that  there  may  be  a  way  to  get  an  approximate  answer  and,  at  the  same  time, 
play  a  very  good  game  of  chess  or  checkers. 

The  backbone  of  most  programs  that  play  two-person  games  is  the  minimax  algo¬ 
rithm  (so  named  because  it  alternately  minimizes  and  maximizes  the  values  of  the 
moves  that  are  considered).  In  principle,  minima. x  could  be  used  to  search  a  complete 
game  tree,  following  each  move  sequence  until  it  ends  in  a  win.  a  loss,  or  a  draw.  In 
practice,  however,  complete  trees  are  too  large  for  that  to  be  feasible.  For  example, 
Claude  Shannon  (mentioned  above  as  the  author  of  the  first  paper  on  computer  chess 
in  1950). estimated  that  a  typical  chess  game  takes  40  moves  (for  each  player)  and  that 
there  is  an  average  of  about  30  choices  at  each  point.  So  the  total  number  of  moves  that 
would  be  examined  in  a  complete  game  tree  would  be  about  (30*30)4"  2;  Kj'ts  mQc1_ 
ern  estimates  pul  the  size  of  a  chess  game  tree  at  about  10ir\  For  comparison,  the  num¬ 
ber  of  atoms  in  the  observable  universe  is  estimated  to  be  about  107v, 
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FIGURE  N.5  A  game  tree 
in  which  a  static  evaluation 
function  has  been  applied  to 
each  leaf  node. 


So  practical  programs  look  ahead  as  many  moves  (typically  called  ply)  as  they 
can,  given  the  time  they  are  allotted.  They  examine  the  game  configurations  that 
result,  applying  to  each  a  heuristic  function,  generally  called  a  static  evaluation 
function .  that  measures  how  promising  the  configuration  is.  Then  they  choose  a 
move  based  on  those  scores.  When  minimax  is  used  in  that  way,  it  becomes  a 
heuristic  search  technique  of  the  sort  described  in  Section  30.3.  Like  other  heuris¬ 
tic  search  algorithms,  it  is  an  approximation  technique.  It  isn't  guaranteed  to  return 
an  optimal  result.  But  optimality  isn’t  generally  required  in  games.  It  suffices  to 
find  a  move  that  is  good  enough  to  win. 

To  sec  how  minimux  works,  consider  the  two-ply  search  tree  shown  in  Figure  N.5. 
Below  each  leaf  node  is  shown  the  value  of  the  static  evaluation  function  applied  to 
that  node.  We  will  assume  that  all  evaluations  are  done  from  the  perspective  of  player p 
whose  turn  it  is  to  move  from  position  A.  High  scores  are  good  for  pluyerx. 

The  job  of  minimux  is  to  generate  this  tree  and  to  apply  the  static  evaluation  func¬ 
tion  to  each  leaf  node.  Then  it  must  propagate  scores  upward  so  that  it  can  make  the 
optimal  choice  from  position  A.  Positions  E,  F .  and  G  send  their  scores  up  to  B.  Since 
the  player  who  chooses  from  position  A  is  a  maximizing  player,  the  player  who  choos¬ 
es  from  II  is  a  minimi/.er.  who  will  choose  to  go  to  G,  which  makes  B' s  score  2.  Posi¬ 
tions  II  and  /  send  their  scores  up  to  C.  Again,  this  is  a  minimizing  level,  so  the  choice 
from  C  is  I  and  C 's  score  becomes  —3.  Positions  J  and  K  send  their  scores  up  to  D, 
whose  score  becomes  -9.  From  A,  then,  there  is  a  choice  of  three  moves:  B  (with  a 
score  of  2),  C  (with  a  score  of  -3),  and  D  (with  a  score  of  -9).  From  A .  the  maximiz¬ 
ing  player  will  choose  B ,  which  sets  A' s  score  to  2.  Figure  N.6  shows  the  tree  after  all 
scores  have  been  propagated. 


(5)  <21  (Ul  ('31  (-3)  <-y) 


max 


FIGURE  N.6  A  game  tree 
in  which  values  have  been 
propagated  up  the  tree. 
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The  version  of  minimax  that  we  will  describe  searches  a  game  tree.  It  will  not  check 
to  see  whether  any  of  the  configurations  that  it  examines  have  already  been  considered 
somewhere  else  in  the  tree.  It  is  possible  to  implement  a  graph-based  version  of  the  al¬ 
gorithm  that  does  notice  and  collapse  subtrees  whenever  possible. 

Minimax  will  exploit  three  functions  that  encode  the  facts  about  the  game  we  are 


playing: 

•  The  function  move-gen(node)  returns  a  set  of  nodes  that  correspond  to  the  game 
configurations  that  can  result  from  making  a  single  move  from  the  configuration 
stored  at  node.  It  may  implement  a  legal  move  generator  or  it  may  incorporate  ad¬ 
ditional  heuristic  information  and  generate  only  moves  that  appear  plausible  from 
the  current  position. 

•  The  function  static(node)  returns  the  result  of  applying  a  static  evaluation  function 
to  the  configuration  contained  in  node.  For  chess,  for  example,  static  might  be  a 
measure  of  piece  advantage,  mobility,  and  control  of  the  center.  We  will  define  static 
so  that  it  always  evaluates  from  the  perspective  of  player ,. 

•  The  function  deep-enougli(node.  depth-so-fur )  returns  True  if  the  path  that  currently 
ends  at  node  is  as  deep  as  we  want  to  go  in  the  search  and  False  otherwise.  In  its  sim¬ 
plest  form,  deep-enough  just  counts  to  some  fixed  depth  limit  and  then  returns  True. 
In  more  sophisticated  implementations,  it  considers  additional  factors.  For  exam¬ 
ple,  it's  not  a  good  idea  to  stop  searching  at  a  point  that  is  likely  to  be  half-way 
through  a  piece  exchange. 


Each  node  of  the  tree  that  minimax  explores  will  have  three  fields: position  (a  descrip¬ 
tion  of  a  game  configuration),  score  (if  it  is  known),  and  best -successor  (if  one  is  known). 

A  straightforward  way  to  describe  minimax  is  as  a  recursive  procedure. To  choose  a 
move  from  position  A .  mininuix  will  invoke  move-gen(A  ).Then  it  will  call  itself  on  each 
of  the  resulting  nodes.  Whenever  a  branch  gets  deep  enough,  static  will  evaluate  its  final 
node  and  the  resulting  value  will  be  passed  back  up  to  the  parent  node.  If  minimax  is 
called  from  the  perspective  of  a  maximizing  player,  then  each  call  it  makes  on  successor 
nodes  will  be  from  the  perspective  of  a  minimizing  player,  and  vice  versa.  Minimax  will 
not  return  a  value.  Its  job,  given  A .  is  to  fill  in  A's  score.  And.  unless  A  is  at  the  last  ply 
of  the  search  tree,  it  must  also  fill  in  A's  best-successor. 

To  implement  these  ideas,  we’ll  define  a  procedure. game-search,  whose  job  is  to  cre¬ 
ate  the  first  invocation  of  ntinimax.  Game-search  can  be  called  with  a  starting  game 
configuration,  or  it  can  be  called  at  any  point  during  a  game,  in  which  case  it  is  given 
the  current  configuration  as  input.  It  calls  minimax  with  its  input  configuration,  a  depth 
(so  far)  count  of  0.  and  a  perspective .  that  of  player (.  When  game-search  ends,  best-suc¬ 
cessor  of  the  starting  node  will  be  filled  in  with  the  move  that  should  be  chosen  (if 
there  is  one).  We  can  state  game-search  as: 


game-search(current:  configuration  node)  — 
Call  mini  mux  (cur  rent .  0.  player\). 


Then  minimax  is: 


minimax(positinn:  configuration  node,  depth:  integer,  perspective :  player)  = 

1.  If  deep-enough  (posit  ion.  depth)  then  set  position s  score  to  static(position) 
and  return. 
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2.  Sel  successors  to  the  set  returned  by  move-gen(posi!ion). 

3.  1  f  successors  is  empty  then  there  aren’t  any  moves  to  make  so  set  position's 
score  to  statiiiposition)  and  return. 

4.  I f  successors  is  not  empty  then  examine  them  and  choose  the  best. To  do  this, 
do: 

4.1.  For  each  clement  move  of  successors  do:  I*  Fill  in  a  score  for  move. 


Call  minimax(move,  depth  +  1.  opposite(j)erspeciive )). 

4.2.  Considering  the  scores  of  all  the  moves  in  successors: 

If  perspective  =  player^,  set  chosen  to  an  element  of  successors  with 
the  highest  score. 

If  perspective  =  player2,  set  chosen  to  an  element  of  successors  with 
the  lowest  score. 

5.  Set  position's  score  to  the  score  of  chosen. 

6.  Set  position's  best-successor  to  chosen. 


Heuristic  information  plays  an  important  role  in  minimax  since  it  enables  the  algo¬ 
rithm  to  choose  a  move  without  having  to  search  all  the  way  to  the  end  of  the  game. 
And  it’s  possible  also  that  heuristics  are  embedded  in  move-gen.  But  minimax  is  still  re¬ 
quired  to  search  all  the  subtrees  that  move-gen  creates. 

Now  recall  A*.  the  heuristic  search  algorithm  that  we  introduced  in  Section  30.3.2. 
It  exploits  heuristic  information  that  enables  it  to  ignore  large  parts  of  its  search 
space.  We'd  like  a  way  for  minimax  to  do  the  same  thing.  For  example,  look  again  at 
the  game  tree  of  Figure  N.6.  Once  the  subtree  rooted  at  B  has  been  examined, 
mininuix  knows  that  player j,  a  maximizer,  is  guaranteed  a  move,  B ,  with  a  score  of  2. 
But  it  continues  and  looks  at  the  alternatives  in  hopes  of  doing  better.  It  considers  C 
next,  and  first  notices  that  C  has  the  successor  H.  whose  score  is  0.  At  this  point,  with¬ 
out  looking  at  any  of  H's  alternatives, player2 ,  a  minimizer,  knows  that  it  is  guaranteed 
that  C  ’s  score  can  be  made  no  higher  than  0.  Player (  already  knows  it  can  get  a  2  from 
B.  So  it  can  decide  immediately  that  it  won’t  go  to  C.  It  doesn't  have  to  examine  /  or 


any  other  successors  that  C  might  have.  We'll  call  2  an  alpha  cutoff.  It  corresponds  to 
the  lower  bound  that  a  maximizing  player  can  count  on.  Of  course,  in  our  simple  ex¬ 
ample.  using  an  alpha  cutoff  lets  us  skip  expanding  just  a  single  node.  But  suppose  that 
the  search  were  going  to  eight  ply.  Then  using  the  cutoff  would  save  expanding  a  pos¬ 
sibly  large  subtree  under  /.  Next,  notice  that,  if  we  extend  the  tree  another  ply.  we  can 
see  that  it  is  possible  to  exploit  a  second  threshold,  which  we'll  call  a  beta  cutoff.  It 
corresponds  to  the  upper  bound  that  a  minimizing  plaver  can  count  on. 

ITie  in  in  imax  -  with-a-fi  -pruni  ng  algorithm  takes  advantage  of  the  alpha  and  beta 
cutoifs  that  we  have  just  described. To  see  how  it  works,  consider  the  game  tree  shown 
jn  Figuic  N.7.  Hie  choice  of  a  move  from  D  is  made  by  a  maximizing  player,  so  D  gets 
a  score  ol  5.  which  is  passed  back  to  B.  the  minimizing  player  who  chooses  from  B  is 
thus  glial  anteed  a  score  of  ^5.  So  beta  is  set  to  5  and  passed  down  to  E.  The  choice 
from  E  is  made  by  a  maximizing  player.  J  gets  a  score  of  7,  which  is  passed  back  to  E.  E 
is  tlu,s  guni  anteed  a  scorL‘  nf  —7.  But  the  beta  value  at  E  is  S.couesponding  to  the  fact 
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FIGURE  N.7  A  deeper 
name  tree,  in  which  alpha 
and  beta  cutoffs  can  be  used. 

that  the  minimizing  player  above  E  is  guaranteed  a  score  of  no  more  than  5  by  going  to 
D.  Because  7  >  5.  a  beta  cutoff  occurs  and  the  rest  of  the  successors  of  E  need  not  be 
considered  because  the  move  from  B  to  E  will  never  be  chosen. 

At  this  point,  we  know  that  B' s  score  is  5  and  it  is  passed  back  up  to  A.  So  we  know 
that  A  is  guaranteed  a  score  of  ^5.  That  becomes  the  value  of  alpha.  It  is  passed  down  to 
C.Thc  choice  at  F is  made  by  a  maximizing  player. so  F' s  score  becomes  3.  It  is  passed  up 
to  C.  So  C  is  now  guaranteed  a  score  of  ^3.  But  A  is  guaranteed  a  score  (as  reflected  in 
alpha)  of  at  least  5  by  going  to  B.  Because  3  <  5.  an  alpha  cutoff  occurs  and  the  rest  of 
the  successors  of  C  can  be  ignored  because  the  move  from  A  to  C  will  never  be  chosen. 

Summarizing,  we  sec  that  alpha  cutoffs  correspond  to  guarantees  for  maximizing 
players. They  get  set  (to  reflect  the  best  option  so  far)  at  maximizing  levels  and  they  gel 
used  to  cut  off  search  at  minimizing  levels.  Similarly,  beta  cutoffs  correspond  to  guaran¬ 
tees  for  minimizing  players.  They  get  set  at  minimizing  levels  and  they  get  used  to  cut 
off  search  at  maximizing  levels.  Both  cutoffs  must  be  provided  at  both  levels. 

The  order  in  which  moves  are  examined  now  matters.  So  we  will  let  move-gen  return 
an  ordered  list,  rather  than  a  set.  of  moves. 

As  before,  we'll  assume  that  the  player  to  move  first  is  a  maximizing  player.  Since  no 
information  is  available  yet  .alpha  at  the  top  node  starts  out  as  the  minimum  value  that 
static  can  compute.  Similarly,  beta  at  the  top  node  starts  out  as  the  maximum  value  that 
static  can  compute.  We  can  describe  game-search-a-fi  and  minimax-with-a-fi  as  follows: 

game-search-a-fi( current: configuration  node)  = 

1.  Return  minimax- with-a- ^(current. 

0. 

player 

minimum  value  static  can  compute. 

maximum  value  static  can  compute). 

minimax -with-a- ^(position:  configuration  node,  depth:  integer,  perspective :  player 

alpha,  beta:  integers)  = 

1.  If  deep-enough(positiun.  depth)  then  set  position's  score  to  static(j>osition)  and 
return. 
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2.  Set  successors  to  the  list  returned  by  move-gen(posilion). 

3.  If  successors  is  empty  then  (there  aren’t  any  moves  to  make  so)  set  position's 
score  to  static(position)  and  return. 

4.  If  successors  is  not  empty  and  perspective  =  playerx  (maximizing)  then  do: 

4.1.  Set  chosen  to  the  first  element  of  successors. 

4.2.  Do  the  following  for  each  element  move  of  successors,  stopping  if  a  cutoff 
occurs: 

Call  minintax(move,  depth  +  1,  opposite{perspective),  alpha,  beta). 

If  move's  score  >  chosen's  score  then  set  chosen  to  move. 

If  move’s  score  s  beta  then  all  other  elements  of  successors  can  be 

skipped.  Cut  off  and  exit  the  loop. 

If  move's  score  >  alpha  then  set  alpha  to  move's  score. 

5.  If  successors  is  not  empty  and  perspective  =  player 2  (minimizing)  then  do: 

5.1.  Set  chosen  to  the  first  element  of  successors. 

5.2.  Do  the  following  for  each  element  move  of  successors,  stopping  if  a  cutoff 
occurs: 

Call  minimax[move,  depth  +  \,opposite(jjerspective),  alpha,  beta). 

If  move's  score  <  chosen's  score  then  set  chosen  to  move. 

If  move's  set  ire  s  alpha  then  all  other  elements  of  successors  can  be 

skipped.  Cut  off  and  exit  the  loop. 

If  move's  score  <  beta  then  set  beta  to  move's  score. 

6.  Sot  position's  score  to  the  score  of  chosen. 

7.  Set  position's  best-successor  to  chosen. 

The  difference  in  performance  between  minimax  and  minimax-with-a-(5  depends 
on  the  order  in  which  moves  are  considered.  If  the  best  moves  are  considered  last, 
minimax-with-a-p  cannot  prune  any  subtrees.  It  gets  its  best  performance  when  the 
best  moves  are  always  considered  first.  Of  course,  if  we  always  knew  the  best  move  be¬ 
fore  we  searched  the  tree,  we  wouldn’t  need  to  search  at  all.  But  it  is  possible  to  use 
heuristics,  such  as  the  function  static,  to  make  informed  guesses  about  the  order  in 
which  to  consider  moves.  Without  using  alpha-beta  cutoffs, minimax  must  search  0(b"), 
nodes,  w  here  b  is  the  average  branching  factor  of  the  tree  and  n  is  the  depth  of  the 
search.  If  the  best  move  is  always  considered  first,  then  minimax-with-cx-P  searches 
0(h"'~)  nodes  [Knuth  and  Moore  1975].  So.  in  a  fixed  amount  of  time,  it  can  search  to 
twice  the  depth  that  minimus  can. 

Neither  heuristic  search  nor  the  use  of  cutoffs  alters  the  fact  that  games  like  chess 
and  go  are  hard,  as  suggested  by  the  fact  that  their  generalizations  are  EXPTIME- 
complctc.  But  a  chess-playing  program  has  beaten  a  reigning  w'orld  champion.  No 
Go-playing  program  has  yet  come  close  to  doing  that.  Why  does  Go  appear  harder 
than  chess?  The  answer  is  that  constants  do  sometimes  matter. There  came  a  time 
when  there  existed  exactly  enough  computing  power  to  search  a  chess  game  tree 
fast  enough  to  win  some  games  against  a  champion.  The  game  trees  for  Go  (which  is 
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tvpicallv  plaved  on  .»  I‘*  hv  l‘i  boar  dl  .»rc  bushier  .md  deeper  than  those  in  chess  And 
it  is  much  harder  to  evaluate  a  <  in  psrsilion  that  iui  t  vet  a  Ain  or  a  Iocs  and  determine 
how  likely  it  is  to  turn  into  one  So  there  in  not  vet  enough  computin|t  power  to  enable 
a  program  to  Ain  a  championship  match  l  It  cstutsc  people  on  t  search  anv  (aster  than 
a  computer  can  In  (act.  they  search  a  lot  slower  lho  suggests  that  the  wav  to  build  a 
winning  Cio  program  is  to  look  tor  approaches  that  rclv  less  on  search  and  more  on 
particular  patterns  and  heuristics  that  ate  er ailed  especially  to  the  game  Research  on 
(»o  is  focused  on  dome  exactly  that 


N.3  Interactive  Video  Games 

Ihtcc  of  the  techniques  that  ac  have  discussed  in  this  book  arc  widely  used  in  the  con* 
struction  ol  interactive  computer  games  N\e  II  mention  c*h  ot  them  briefly,  but  there 
i\  much  more  that  could  he  said  —  I  or  a  more  eswnprs-hcnsisc  discussion  of  these  and 
other  techniques. see  anv  good  tmok  nikIi  as  |<  hampatidaid  •  v-l J  on  the  role  ol  artl* 
licial  intelligence  m  game  development 


N.3.1  Finite  State  Machines 

I  mite  state  machines  are  used  m  a  vanclv  of  wavs  to  define  the  Is hasior  ol  intcractts-c 
games.  In  I* -l.  ac  mention  one  example,  the  use  ol  an  I  S\|  in  describe  lire  Ivhaxiot  of 
a  soccer  placing  robot  In  this  section,  ac  II  mention  another 

Mans  interactive  games  are  structuicd  as  finite  state  machines  <  Uteri  thev  are  dc* 
termmistic  machines  I  he  advantage  ot  111  SMs  is  that  thev  arc  predictable  this  means 
that  human  players  can  learn  how  the  games  woik  and  so  can  improve  then  perform¬ 
ance  relative  to  the  machines  I  Mher  games  use  rionslc  ter  mints!  w  I  SMv  I  hen  adsan- 
lace  is  that  thev  are  unpredictable  Vi  thee  create  iimc  plausible  sqqsmcnls  and  arc 
thus  more  appealing  to  plav 

In  either  ease,  the  states  of  the  machine  correspond  to  situations  fvpieally  ones 
in  which  an  agent  is  doing  lor  not  dome  l  something  Ihc  inputs  to  the  machine  essf. 
respond  to  events  llul  |ust  like  the  single  characters  that  are  Ihc  inputs  to  the  sim¬ 
ple  languagc-rccogni/ing  machines  we  have  been  building  the  inputs  tlial 
correspond  to  game  events  can  be  represented  be  a  finiie  set  ol  ssmbols  Ihc  job  o 
I  SMs.  when  used  in  this  context,  is  not  to  decide  a  language  but  rather  to  describe 
the  behavior  of  the  game  So  propctlv  we  should  sac  that  we  aic  using  Imilc  stati 
transducers  rather  than  machines  lhat  tciminologs  is  however  latclx  used  in  thi 
context 

In  l  igure  Si  h.  ac  show  a  lov  example  In  m.wi  real  games  the  nonplavcr  character 
(Nl*<  s|  ate  controlled  bv  a  set  of  I  S\|s  each  of  which  iksnbes  one  aspect  of  hchav 
tor  for  example  one  might  be  responsible  for  wrafsm  selection  one  might  choose 
target  and  a  third  might  control  movement  In  addition  realistic  applicalt«*ns  isltc 
exploit  probabilities  associated  with  the  transitions  In  such  machines  each  of  a  set  s 
competing  transitions  is  chosen  with  ihc  piotsalnhlv  that  ts  attached  to  it 
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N.3.2  Heuristic  Search  and  the  A •  Algorithm 

Consider  ihc  problem  of  computing  the  movements  of  the  nonplayer  characters  (NPCs) 
in  an  interactive  game One  very  simple  approach  to  solving  this  problem  is  just  to  let  the 
charoctcrv  move  in  vs  hat  appears  to  be  the  correct  direction  and  hope  that  they  don't  run 
into  obstacle*,  Generally  a  belter  approach  is  to  plan  a  route  in  advance,  taking  advan- 
lage  of  maps  and  other  terrain  information. 

liecausc  paths  are  being  computed  in  real  time,  as  a  game  is  being  played,  we  want 
an  efficient  algorithm  for  computing  the  best  path  to  be  followed  for  each  player.  The 
A*  algorithm,  as  described  in  Section  30.3.2.  is  widely  used  to  do  this  Q. 

Recall  that  the  A •  algorithm  implements  a  kind  of  heunstic  search.  It  makes  use  of 
two  sorts  of  cost  information: 

*  Ihc  costs  that  ate  associated  with  each  of  the  operators  that  generate  new  states:  In 
the  game  environment,  the  simplest  way  to  define  these  costs  is  the  actual  distance 
traveled  If  we  do  that,  well  find  the  shortest  path  from  the  current  state  to  a  goal. 
Ilut  other  factors  besides  distance  can  be  added  Rnr  example,  moves  that  go  through 
dangerous  territory  can  be  assigned  higher  cost  than  moves  that  go  through  areas 
that  arc  known  to  be  safe  Moves  that  reveal  to  the  enemy  out  knowledge  of  the  ter* 
rain  can  be  assigned  higher  costs  than  moves  that  give  nothing  away,  and  so  forth. 

•  1  stimates  of  future  costs,  as  provided  by  the  heuristic  function:  These  are  guess¬ 
es  about  how  much  it  will  cost  to  move  from  the  current  state  to  a  goal.  Ihc  sim¬ 
plest  way  to  make  such  estimates  is  to  use  some  kind  of  geometric  distance.  If 
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travel  in  any  direction  is  allowed.  Euclidean  distance  ( the  length  of  a  straight  line 
between  two  points)  may  be  a  good  (and  admissible)  heuristic.  If  travel  is  restrict¬ 
ed  to  roads  that  form  a  grid,  then  Manhattan  distance  may  be  a  good  (and  also  ad¬ 
missible)  heuristic. The  Manhattan  distance  between  two  points  (.Vj,  V|)  and  (*2.  yi) 
is  |xi  -  a*2 I  +  I  Vi  -  ys|.  In  other  words,  it  is  the  distance  that  must  be  traveled  in 
the  plane  if  no  diagonal  moves  are  allowed.  It  is  named  after  the  arrangement  of 
(most  of)  the  streets  in  the  borough  of  Manhattan  in  New  York  City.  Manhattan 
distance  is  often  used  in  games  that  don't  use  roads  but  that  model  their  environ¬ 
ments  as  square  or  hexagonal  grids. 

Of  course,  as  in  describing  real  costs,  cost  estimates  can  also  include  estimates  of 
factors  other  than  distance.  For  example,  paths  through  mountainous  terrain  may 
cost  more  than  flat  paths.  Dangerous  paths  and  those  that  require  the  expenditure 
of  scarce  resources  will  also  have  high  cost. 

In  real-time  games,  path-finding  must  be  done  very  efficiently.  It  may  be  more  im¬ 
portant  to  find  a  good  path  quickly  than  to  find  an  optimal  path  several  minutes  from 
now.  So  inadmissible  heuristic  functions  that  do  a  good  job  of  pruning  the  search  space 
are  fairly  widely  used. 


N.3.3  Rule-Based  Systems  that  Control  NPCs  in  Interactive  Games 

In  M.3.  we  sketched  the  origin  of  rule-based  systems  as  descendants  of  one  of  the 
earliest  formal  models  of  compulation.  We  also  described  their  basic  architecture. 
Rule-based  systems  (or  RBSs)  deserve  mention  in  the  context  of  game  develop¬ 
ment  because  they  have  proved  to  be  useful  for  describing  the  behavior  of  non¬ 
player  characters  (NPCs)  in  interactive  games.  Rules  can  be  used  to  define  a 
variety  of  kinds  of  behaviors.  For  example,  they  can  be  used  for  problem  solving,  in 
much  the  same  way  they  are  used  by  expert  systems.  There  might,  for  instance,  be 
rules  such  as: 

If:  The  laser  is  gone. 

Then:  Someone  else  has  it. 

If:  It  is  night  and  the  power  is  out. 

Then:  The  enemy  cannot  see  me. 


Rules  can  also  be  used  to  deRne  behaviors,  again  in  much  the  same  way  in  which  they 
are  used  in  expert  systems.  There  might,  for  instance,  be  rules  such  as: 

If:  1  have  just  hit  a  wall. 

Then:  Turn  to  the  right  and  keep  walking. 

If:  A  grenade  has  just  been  thrown. 


Then:  Duck. 
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If:  I  have  a  shotgun  and  I  do  not  have  ammunition  for  a  shotgun, 

Then:  Find  ammunition. 

Rule-based  systems  of  varying  degrees  of  sophistication  are  being  used  to  build 
NPCs.  Simple  systems  consist  of  sets  of  rules  such  as  the  ones  we  just  described.  Anoth¬ 
er  approach  is  to  start  with  SOAR  (described  in  M.3.2),  a  comprehensive,  rule-based 
cognitive  architecture  whose  goal  is  to  model  human  behavior.  The  idea  is  that  agents 
based  on  SOAR  will  exhibit  convincingly  human-level  behavior. 


APPENDIX  0 


Applications:  Using  Regular 
Expressions 

Patterns  are  everywhere.  Regular  expressions  describe  patterns.  So  it's  easy  to 
see  that  regular  expressions  could  be  useful  in  a  wide  variety  of  applications. 
We  have  already  discussed  some  important  uses  of  regular  expressions,  includ¬ 
ing  the  description  of  lexical  analyzers  that  are  used  by  context-free  parsers  and  the 
description  of  protein  motifs  that  are  to  be  matched  against  protein  sequence  data¬ 
bases.  In  01.2,  we'll  describe  their  use  in  defining  XML  document  types.  In  this  ap¬ 
pendix,  we  briefly  highlight  the  use  of  regular  expressions  in  programming 
environments  and.  more  broadly,  in  computer  system  tools. 

A  quick  look  through  the  manuals  for  many  programming  languages  (such  as  Perl 
S,  Python  S,  and  Java  Q),  as  well  as  systems  utilities  (such  as  the  Unix  file  search¬ 
ing  program  grep  5  or  the  mailing  list  management  system  Majordomo  5).  will  turn 
up  a  chapter  on  regular  expressions  (or  “regexes").  But  we  must  be  careful.  While 
these  systems  share  a  name  and  some  syntax  with  the  pattern  language  that  we 
described  in  Chapter  6  and  they  were  certainly  inspired  by  that  language,  they  are, 
both  formally  and  practically,  quite  different.  In  these  systems,  it  is  possible,  as  we'll 
see  below,  to  write  regular  expressions  that  describe  languages  that  are  not  regular. 
The  added  power  comes  from  the  presence  of  variables,  whose  values  may  be  strings 
of  arbitrary  length. 

For  example,  consider  the  regular  expression  language  that  is  supported  in  Perl. 
Table  0.1  shows  some  of  the  most  important  constants  and  operators  in  that  lan¬ 
guage.  Notice  that  all  of  the  operators  that  exist  in  the  regular  expression  language  of 
Chapter  6  arc  present  here,  although  union  is  represented  differently.  Some  new 
operators,  such  as  replication,  word  boundary,  and  nonword  boundary,  are  simply 
convenient  shorthands  for  patterns  that  can  easily  be  written  in  our  original  lan¬ 
guage.  Much  of  the  rest  of  the  syntax  is  necessary  because  of  the  large  character  set 
including  nonprinting  characters,  that  may  occur  in  real  texts.  But  the  most  important 
difference  between  the  two  languages  is  the  ability  to  assign  a  value  to  a  variable 
Then  the  variable  can  be  used  in  a  pattern  and  we  can  require  that  it  match  the  same 
substring  each  time  it  occurs. 
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Table  0.1 


i ibc 

a\b\c  , 


a'- 

a  {«,»»} 


$ 

fa-z) 

pa-zj 

\d 

\0 

\w 

\W 

\s 

\5 

\n 

\r 

\t 

\f 

\b 

\k> 

\B 

\0 

\rinn 

\%nrt 

\cX 

\char 

(a) 


Regular  expressions  in  Perl. 


Description 


Concatenation  Matches  a,  then  b .  then  c,  where  a.  b,  and  c  are  any  regexs 

Union  (Or)  Matches  a  or  h  or  c.  where  a,  b,  and  c  are  any  regexs 

Klccne  star  Matches  0  or  more  a\  where  a  is  any  regex 

At  least  one  Matches  1  or  more  a’s,  where  a  is  any  regex 

Matches  0  or  1  a’s,  where  a  is  any  regex 

Replication  Matches  at  least  n  but  no  more  than  m  a’s,  where  a  is  any  regex 

Parsimonious  Turns  off  greedy  matching  so  the  shortest  match  is  selected 


Name 


Wild  card 
Left  anchor 
Right  anchor 


Digit 

Nondigit 

Alphanumeric 

Nonalphanumeric 

White  space 

Nonwhite  space 

Newline 

Return 

Tab 

Formfeed 

Backspace 

Word  boundary 

Nonword  boundary 

Null 

Octal 

Hexadecimal 

Control 

Quote 

Store 

Variable 


Matches  any  character  except  newline 

Anchors  the  match  to  the  beginning  of  a  line  or  string 

Anchors  the  match  to  the  end  of  a  line  or  string 

Assuming  a  collating  sequence,  matches  any  single  character  in  range 

Assuming  a  collating  sequence,  matches  any  single  character  not  in 
range 

Matches  any  single  digit,  i.e.,  string  in  (0-9] 

Matches  any  single  nondigit  character,  i.e.,  [A  0-9] 

Matches  any  single  "word”  character,  i.e.,  [a-zA-ZO-9] 

Matches  any  character  in  [Aa-zA-Z0-9] 

Matches  any  character  in  (space,  tab.  newline,  etc.] 

Matches  any  character  not  matched  by  \s 

Matches  newline 

Matches  return 

Matches  tab 

Matches  formfeed 

Matches  backspace  inside  [] 

Matches  a  word  boundary  outside  [] 

Matches  a  non-word  boundary 
Matches  a  null  character 

Matches  an  ASCII  character  with  octal  value  nnn 

Matches  an  ASCII  character  with  hexadecimal  value  nn 

Matches  an  ASCII  control  character 

Matches  char,  used  to  quote  symbols  such  as .  and  \ 

Matches  a,  where  a  is  any  regex,  and  stores  the  matched  string  in  the 
next  vanable  6 

Matches  whatever  the  first  parenthesized  expression  matched 
Matches  whatever  the  second  parenthesized  expression  matched 
For  all  remaining  variables 
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It  is  possible  to  write  many  useful  regular  expressions  without  exploiting  variables, 
as  we  can  see  in  the  next  two  examples: 

EXAMPLE  0.1  Spam  Detection 

The  following  regular  expression  matches  the  subject  field  of  at  least  some  email 
messages  that  are  likely  to  be  spam: 

\badv\  (?ert\)  ?\b 


EXAMPLE  0.2  Email  Addresses 

The  following  regular  expression  scans  text  looking  for  valid  email  addresses: 
\b [A-Za-z0-9_%-] +@ t A-Za-zO-9  JH +(\ . [A-Za-z] +) (1 , 4}\b 


But.  using  variables,  it  is  possible  to  do  things  that  would  not  be  possible  without  them, 
as  we  can  see  in  the  next  two  examples. 


EXAMPLE  0.3  WW 

Recall  the  language  WW  =  :  we  {a.b  }*}.The  following  regular  expression 

matches  all  and  only  strings  in  WW: 

A([ab]*)\l$ 

The  pattern  [ab]*  can  match  any  string  of  a's  and  b’s.  Whatever  it  matches  is 
stored  in  the  variable  1  (because  [ab]*  is  the  first  pattern  in  parentheses  in  the 
expression).  Then  \1  will  match  a  second  occurrence  of  the  same  string  of  a’s  and 
b*s.  The  anchors  a  and  S  force  the  pattern  to  start  at  the  beginning  of  the  target 
string  and  end  at  the  end  of  it.  So  this  pattern  matches  all  and  only  strings  in  WW. 


EXAMPLE  0.4  Finding  Duplicated  Words 

Suppose  that  we  want  to  proof  read  some  text  that  we  are  writing.  One  common 
error  is  to  duplicate  a  simple  word  like  the.  The  following  regular  expression 
matches  duplicated  words: 

\b([A-Za-z]+)\s+\l\b 
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By  using  variables,  it  is  possible  to  define  languages  that  are  not  regular.  This  means 
that  it  is  no  longer  possible  to  compile  an  arbitrary  regular  expression  into  a  delermin- 
islic  finite  state  machine  that  can  decide  whether  the  expression  matches  against  an 
input  string.  While  every  deterministic  finite  state  machine  runs  in  time  that  is  linear  in 
the  length  of  its  input,  we  cannot  make  this  claim  for  a  regex  matcher  when  variables 
are  allowed.  In  fact,  it  can  be  shown  that  regular  expression  matching  in  Perl  (where 
variables  are  allowed)  is  NP-hard.To  see  why  search  may  be  required,  consider  again 
the  regex  of  Example  46.3.  When  it  is  attempting  to  match  against  a  string  ?<’,  it  may 
have  to  try  each  position  in  w  as  it  searches  for  a  place  to  stop  matching  [ab]*  and 
start  matching  \1. 

Most  regular  expression  languages,  including  the  one  we  just  described  in  Perl,  sup¬ 
port  not  just  siring  matching  but  also  string  manipulation.  For  example,  it  is  easy  to 
write  a  Perl  expression  that  works  like  a  production  in  a  Post  system  (see  Section 
1 8.2.4).  The  first  part  of  the  expression  defines  a  pattern  to  be  matched  and  the  second 
part  describes  a  siring  that  should  be  substituted  for  the  matched  substring.  The  Perl 
syntax  for  siring  substitution  is: 

$  variable  =  -  slregexlresultl: 

When  such  a  command  is  executed,  the  first  substring  in  Svariable  that  matches 
regex  w  ill  be  replaced  hv  result.  If  the  symbol  g  (for  global)  is  inserted  after  the  last  /.  all 
instances  of  regex  will  be  replaced  by  result. 


EXAMPLE  0.5  Deleting  Duplicated  Words 

Continuing  with  the  duplicated  word  example,  we  might  want  to  write  a  substitu¬ 
tion  command  that  deletes  the  second  occurrence  of  any  duplicated  word  (plus 
the  white  space  in  between  the  words).  We  could  do  that,  assuming  a  text  string  in 
$text.  as  follows; 

Stext  =-  s/\b( [A-Za-z] +)  \s+\l\b/\l/g; 


EXAMPLE  0.6  A  Simple  Chatbot 

Regular  expression  substitution  can  be  used  to  build  a  simple  chatbot.  For  ex¬ 
ample,  suppose  that  whenever  the  user  types  an  expression  of  the  form 
<  phrase  I  >  is  <phrase2> ,  we  want  our  chatbot  to  reply  with  the  expression 
Why  is  < phrasel >< phrase2>l  So,  on  input,  “The  food  there  is  awful”  our 
chatbot  would  reply,  “Why  is  the  rood  there  awful?”  We  could  do  this,  again  assum¬ 
ing  that  the  input  text  is  stored  in  the  variable  Stext,  as  follows: 

Stext  s/a  ( [A-Za-z] +)\si s\s ( [A-Za-z] +)\ - ? S/Why  is  \1  \2?/; 
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In  this  appendix,  we  illustrate  some  examples  of  early  (i.c..  before  the  advent  of 
modern  computers  in  the  middle  of  the  20th  century)  finite  state  machines. Then 
we  consider  some  current  applications  of  FSMs. 


P.1  Finite  State  Machines  Predate  Computers 

The  history  of  finite  state  machines  (also  called  finite  state  automata)  substantially  pre¬ 
dates  the  history  of  anything  we  would  now  call  a  "computer". The  Oxford  English  Dic¬ 
tionary  |OED  IWW|  lists  the  following  among  its  definitions  of  the  word,  “automaton": 

3.  A  piece  of  mechanism  having  its  motive  power  so  concealed  that  it  appears  to 
move  spontaneously: *u  machine  that  has  within  itself  the  power  of  motion  under 
conditions  fixed  for  it.  but  not  by  it'  (W.  B.  Carpenter).  In  17-lNlh  c.  applied  to 
clocks,  watches,  etc.,  and  transf.  to  the  Universe  and  World:  now  usually  to  figures 
which  simulate  the  action  of  living  beings,  as  clock-work  mice,  images  which 
strike  the  hours  on  a  clock. etc. 

Automata,  in  this  sense,  have  fascinated  people  for  millennia. 

P.1.1  The  Antikythera  Mechanism 

The  Antikythera  Mechanism,  built  in  Greece  around  Kl)  BC.  is  perhaps  the  earliest 
known  example  of  a  sophisticated  mechanical  automaton.  Crafted  in  bronze,  it  con¬ 
tained  at  least  30  precision  gears  inside  a  wotnlen  case  that  was  covered  with  writing  It 
was  discovered  in  lYUl.as  part  of  a  shipwreck  off  the  Greek  island  of  Antikvthera. 
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FIGURE  P.l  A  fragment  of 
the  Antikythera  Mechanism. 


After  about  2.000  years  under  water,  the  device  is  fragmented  and  corroded,  as  can  be 
seen  from  the  photograph  in  Figure  P.  1  .Thus  its  exact  function  is  unclear. 

It  appears,  however,  to  have  been  an  astronomical  calculator  that  was  substantially 
more  sophisticated  than  any  others  that  are  known  to  have  been  built  for  at  least 
another  l .000  years.  Using  modern  techniques  B,  researchers  have  been  able  to  ana¬ 
lyze  the  mechanism  and  to  build  a  model  that  describes  a  likely  hypothesis  for  how  the 
mechanism  worked.  Figure  P.2  (a)  shows  a  front  view  of  that  model:  Figure  P.2  (b) 
shows  a  rear  view. 


The  Prague  Orlo] 

Another  spectacular  example  of  an  early  automaton  is  the  Prague  orloj  Q .  shown  in 
Figure  P.3(a).'l1ie  orloj  is  an  astronomical  clock  mounted  on  a  wall  of  the  old  town  city 
hall.  Ilie  original  clock  was  built  in  1410.  At  that  point,  it  consisted  of  just  an  astronomical 


FIGURE  1*2  Iwo  views  of  a  modem 
Mechanism. 


reconstruction  of  the  Antikythera 
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FIGURE  P.3  The  Prague  orloj. 


dial,  whose  state  was  controlled  by  three  gears  (with  365,  366  and  379  cogs)  on  the 
same  axle.  The  state  of  the  dial  represented  the  positions  of  the  sun.  the  moon,  and  the 
stars.  Later,  a  calendar  dial  was  added  beneath  the  original  one.  Later  still,  three  sets  of 
figures  were  added: 

•  A  set  of  four  figures  that  represent  threats  to  the  city:  a  skeleton  representing 
death,  a  Turk,  a  miser,  and  vanity.  While  these  figures  do  not  move,  they  do 
have  moving  parts.  They  are  shown,  next  to  the  original  astronomical  dial,  in 
Figure  P.3(b). 

•  A  set  of  four  figures  that  represent  virtues:  an  angel,  a  chronicler,  a  philosopher, 
and  an  astronomer.  These  figures  do  not  move  at  all. 

•  The  twelve  Apostles. 

As  the  hour  is  about  to  chime,  the  skeleton  tolls  the  bell.  Then  the  Apostles  parade 
through  two  small  doors  above  the  original  dial.  Then,  as  the  clock  chimes,  the  Turk 
shakes  his  head,  the  miser  watches  his  bag  of  money,  and  vanity  admires  itself  in  the 
mirror. 

The  Prague  orlog.  as  well  as  other  early  mechanical  clocks,  is  typical  of  a  simple  class 
of  finite  automata  that  accept  no  input  except  the  passage  of  lime. 
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p  i  .3  The  Abacus 

The  abacus  R.  shown  in  Figure  P.4,  on  the  other  hand,  does  accept  input.  In  one  form 
or  another,  the  abacus  has  been  in  use  for  over  2,000  years.  It  is  a  computer  whose 
inputs  (bead  movements)  correspond  to  the  steps  required  to  perform  a  calculation 
and  whose  state  corresponds  to  the  result  of  performing  the  calculation. 

Modern  computers  are,  in  some  sense,  finite  state  devices,  since  the  actual  uni¬ 
verse  (or  at  least  the  part  of  it  that  we  can  observe)  does  not  contain  an  infinite 
number  of  molecules  that  could  be  used  to  encode  memory.  But  we  model  them  as 
Turing  machines  because  there  is  no  a  priori  upper  bound  on  the  amount  of  memo¬ 
ry.  New  tapes  or  disks  can  always  be  provided.  But  the  abacus  is  different.  When  an 
abacus  is  built,  the  largest  number  it  can  record  is  fixed.  So  it  truly  is  a  finite  state 
computer. 


P- 


1.4 


programmable  Automata  and  the  Jacquard  Loom 


A  loom  is  a  finite  state  machine  whose  states  correspond  to  configurations  of  the  warp 
threads.  A  weaver  works  a  loom  by  throwing  a  shuttle,  wound  with  the  weft  threads, 
back  and  forth  through  the  warp.  By  raising  and  lowering  the  warp  threads,  the  weaver 
can  create  a  pattern.  The  shuttle  will  fly  below  all  raised  warp  threads  and  above  all 
lowered  ones.  A  simple  pattern  can  be  created  from  a  two  state  machine:  In  the  first 
state,  all  the  even  numbered  threads  are  raised.  The  shuttle  is  thrown,  and  then  the 
machine  enters  the  second  state,  in  which  only  the  odd  numbered  threads  arc  raised. 
The  shuttle  is  thrown  and  the  machine  returns  to  state  one. 


But  more  intricate  patterns  require  long  sequences  of  states,  in  each  of  which  a  care¬ 
fully  selected  set  of  warp  threads  has  been  raised.  Since  a  loom  may  be  required  to 
weave  one  pattern  one  week  and  a  different  pattern  another  week,  the  patterns  cannot 
pc  built  into  the  loom  itself.  It  must  be  programmable. 

During  the  IK  century,  weavers  tried  various  techniques,  other  than  by  hand,  for 
raising  and  lowering  the  warp  threads.  In  1K()1%  Joseph  Marie  Jacquard  created  the 
looni  R  that  bears  his  name.  An  example  of  a  Jacquard  loom  is  shown  in  Figure  P.5. 
Jacquard  s  idea  was  to  encode  each  pattern  as  a  loop  of  punched  cards.  Each  card 
encoded  one  row  of  the  pattern.  The  cards  passed  by  a  set  of  pins,  which  could  go 
through  the  card  in  the  positions  of  the  holes.  Each  hole  controlled  a  hook,  which  was 
attached  to  one  or  more  warp  threads.  If  the  hook  was  raised,  it  pulled  up  the  warp 
threads  to  which  it  was  attached. 
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FIGURE  P.5 
A  Jacquard  loom. 


Each  loop  of  cards  defines  a  specific  finite  stale  machine  with  n  states,  where  n  is  the 
number  of  cards  in  the  loop.  Every  such  machine  has  an  input  alphabet  that  consists  of 
the  single  symbol  next,  which  is  executed  by  the  weaver  after  each  time  the  shuttle  is 
thrown.  Each  time  it  gels  the  next  signal,  the  loom  moves  from  one  state  (w'arp  config¬ 
uration)  to  the  next. 


P.2  The  Towers  of  Hanoi:  Regular  Doesn't  Always 
Mean  Tractable 

The  Towers  of  Hanoi  problem  S  was  invented  by  Francois  Edouard  Anatole  Lucas 
who  published  it  in  1883  under  the  name  of  M.  Claus.  To  solve  the  problem  it  is 
required  to  move  a  stack  of  disks  from  one  pole  to  another  while  obeying  a  couple  of 
simple  rules.  An  example  of  a  starting  configuration  is  shown  in  Figure  P.6. 
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FIGURE  P.6  The  Towers  of  Hanoi. 
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Various  stories  have  been  created  to  go  along  with  the  problem.  One  version  is  the  following: 

In  a  monastery  in  India  there  are  three  poles  and  64  golden  disks,  each  of  a 
different  diameter.  When  God  created  the  universe,  he  stacked  the  disks  on 
the  first  of  the  poles,  ssith  the  largest  on  the  bottom. The  remaining  disks  were 
stacked  in  order  of  si/e.  with  the  smallest  on  the  top.  The  monks  were  given 
the  task  of  moving  all  64  disks  to  the  last  pole.  But  the  disks  are  sacred,  so 
there  are  important  rules  that  must  be  followed.  Whenever  a  disk  is  removed 
from  a  pole,  it  must  immediately  be  placed  on  some  other  pole.  No  disks  may 
be  placed  on  tlte  ground  or  held.  Further,  a  disk  may  never  be  placed  on  lop  of 
a  smaller  disk. The  monks  were  told  that  they  must  begin  working  immediately, 
taking  turns  around  the  clock. When  they  finish,  the  world  will  end. 

It  is.  in  principle,  possible  for  the  monks  to  accomplish  this  task. The  following  sim¬ 
ple  procedure  solves  an  arbitrary  Towers  of  Hanoi  problem  with  n  disks: 

lou  crsoHltinoUn:  positive  integer)  = 


1.  If  n  -  1  then  move  the  disk  to  the  goal  pole. 

2.  I  Use: 

2.1.  Move  the  top  n  —  1  disks  to  the  pole  that  is  neither  the  current  one 
nor  the  goal. 

2.2.  Move  the  bottom  disk  to  the  goal  pole. 

2.3.  Move  the  n  -  1  disks  that  were  just  set  aside  to  the  goal  pole. 

Fortunately,  even  it  the  story  of  the  monks  and  the  end  of  the  world  is  true,  no  one 
need  worry.  I  'sing  the  procedure  that  we  just  described,  it  will  lake  the  monks  2'’4  -  1 
moves  to  accomplish  the  task.  So.  at  one  move  per  second,  it  would  take  5X4.542.046.090 
years,  22X  days,  15  hours.  14  minutes,  45  seconds.  That's  approximately  6‘  1011  years. 
The  universe. on  the  other  hand,  has  existed  for  probably  about  12  •  It)1’  years  (since  the 
Big  Bang). 

()!  course,  this  analysis  assumes  that  the  monks  cannot  find  a  more  efficient  strategy. 
They  might,  quite  reasonably,  look  lor  a  nonrecursive  solution.  People  are  quite  bad  at 
maintaining  recursion  stacks  in  their  heads.  So  imagine  the  three  poles  arranged  in  a 
circle  (whether  they  actually  are  or  not).  ITien  let's  say  that  the  “next"  pole  is  the  next 
one  in  clockwise  order  and  the  “previous"  pole  is  the  previous  one.  again  in  clockwise 
order.  A  clever  monk  might  come  up  w  ith  the  following  solution: 

li>\\TrsoriltuioidrcU'(n:  positive  integer)  = 


1. 

2. 


Move  the  smallest  disk  to  the  next  pole. 

I  ntil  all  disks  form  a  single  stack  on  some  pole  other  than  the  starting 
one  do: 


2.1.  Make  the  only  legal  move  that  does  not  involve  the  smallest 

l  <  . .  .  .  I.  .  ...  it  ... 


vi »•»«*  • 


2.2.  Move  the  smallest  disk  to  the  next  pole. 

II  the  number  of  disks  is  odd.  this  technique  will  move  all  of  the  disks  from  the  startin 

pole  to  the  next  one.  II  the  number  of  disks  is  even,  it  will  move  all  of  the  disks  from  th 
starling  pole  to  the  previous  one. 
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This  technique  seems  quite  different  from  the  first.  It  can  easily  be  implemented, 
even  by  a  child,  for  small  values  of  n.  But  it  actually  requires  exactly  the  same  number 
of  moves  as  does  the  recursive  technique.  And  no  better  technique  exists. 

So  the  shortest  solution  to  the  64-disk  problem  is  very  long.  Nevertheless,  the  system 
of  poles  and  disks  can  easily  be  modeled  as  a  nondeterministic  finite  state  machine.  The 
start  state  is  the  one  in  which  all  64  disks  are  stacked  on  the  first  pole. The  accepting  state 
is  the  one  in  which  all  64  disks  are  stacked  properly  on  the  goal  pole.  Because  there  is  a 
finite  number  of  disks  and  the  position  of  each  disk  can  be  uniquely  described  by  naming 
one  of  the  three  poles,  the  number  of  distinct  states  of  the  system,  and  thus  of  the  ma¬ 
chine  well  build  to  model  it,  is  finite.  Finite  but  not  tractable:  This  system  has  3M  states 
(because  each  of  the  64  disks  can  be  on  any  one  of  the  three  poles).  The  transitions  of  the 
machine  correspond  to  the  legal  moves  (i.e.,  those  that  satisfy  the  rule  that  all  disks  must 
be  on  a  pole  and  that  no  disk  may  be  on  top  of  a  smaller  one).  Each  transition  can  be  la¬ 
beled  with  one  of  six  symbols:  12  (meaning  that  the  top  disk  from  pole  1  is  removed  and 
placed  on  pole  2),  13, 21, 23, 31,  and  32.  To  make  the  machine  as  simple  as  possible,  we 
have  left  out  transitions  that  pick  up  a  disk  and  put  it  right  back  in  the  same  place. 

We  can  define  the  Towers  of  Hanoi  language  to  be  the  set  of  strings  that  correspond 
to  move  sequences  that  take  the  system  from  its  start  state  to  its  accepting  state.  The 
Towers  of  Hanoi  language  is  regular  because  it  is  accepted  by  the  Towers  of  Hanoi  FSM 
as  we  just  described  it.  And  it  is  infinite,  since  there  is  no  limit  to  the  number  of  times  a 
disk  can  be  moved  between  poles.  But  the  shortest  string  in  the  language  has  length 
2W  -  1  (namely  the  length  of  the  optimal  sequence  of  moves  that  solves  the  problem). 


P.3  The  Arithmetic  Logic  Unit  (ALU) 

In  most  computer  chip  designs,  the  ALU  performs  the  fundamental  operations  of  inte¬ 
ger  arithmetic,  Boolean  logic,  and  shifting.  The  ALU's  operation  can  be  modeled  as  a 
finite  state  transducer,  using  either  a  Moore  machine  (in  which  outputs  are  associated 
with  states)  or  a  Mealy  machine  (in  which  outputs  are  associated  with  transitions). 

P.3.1  An  Adder 

As  an  example,  consider  a  simple  binary  adder,  shown  in  Figure  P.7. Two  numbers  can  be 
added  by  adding  their  digits  right  to  left.  So  we  can  describe  an  adder  as  a  Mealy  machine 
whose  input  is  a  stream  of  pairs  of  binary  digits  (one  digit  from  each  of  the  two  numbers 
to  be  added).  The  machine  has  two  states,  one  of  which  corresponds  to  a  carry-in  bit  of  0 
and  the  other  of  which  corresponds  to  a  carry-in  bit  of  1.  When  the  machine  is  reset  it 
enters  the  state  corresponding  to  no  carry  (i.e.,  a  carry-in  bit  of  0).This  simple  one-bit  adder 
can  be  embedded  into  a  larger  system  that  adds  numbers  of  any  fixed  number  of  bits. 
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p3.2  A  Multiplier 

Binary  adders  can  also  be  used  as  building  blocks  for  binary  multipliers.  Figure  P.8 
shows  a  schematic  diagram  that  describes  the  behavior  of  an  8-bit  multiplier.  The  mul¬ 
tiplier  can  be  implemented  as  the  finite  state  transducer  shown  in  Figure  P.9. 


Multiplicand 


FIGURE  P.8  A  schematic  diagram  for  an  8-bit  multiplier. 


Start 
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P.4  Controlling  a  Soccer-Playing  Robot 

A  finite  state  machine  may  not  be  the  best  way  to  model  all  of  the  complexity  that  is 
required  to  solve  many  kinds  of  hard  problems.  But  an  FSM  may  be  a  good  way  to 
start.  There  exist  good  tools  for  building  FSMs  and  for  displaying  them  graphically  so 
that  their  structure  is  obvious  to  their  designers.  The  process  of  articulating  the  states 
helps  designers  understand  the  large-scale  structure  of  their  problem.  Experiments 
with  an  FSM-based  prototype  can  be  used  to  highlight  those  parts  of  the  design  that 
require  more  powerful  capabilities.  So  a  reasonable  development  methodology  is: 
Build  an  FSM  as  a  first  shot  at  solving  a  problem.  Run  it.  Decide  where  more  sophisti¬ 
cation  is  required.  Add  it.  Experiment  again,  and  so  forth. 

Let's  look  at  an  example  of  the  successful  use  of  this  approach.  We  begin  with  a 
statement  of  the  problem: 

“The  goal  of  the  international  RoboCup  soccer  initiative  is.  by  2050,  to  develop  a 
team  of  humanoid  robots  that  is  able  to  win  against  the  official  human  World  Soccer 
Champion  team.  In  some  sense,  the  RoboCup  challenge  is  the  successor  of  the  chess 
challenge  (a  computer  beating  the  human  World  Chess  Champion)  that  was  solved  in 
1997  when  Deep  Blue  won  against  Garry  Kasparov."17 

There  exist  a  number  of  different  RoboCup  leagues  that  focus  on  different  aspects 
of  this  challenge.  Figure  P.10  shows  a  Sony Aibo,v  robot.  For  several  years,  one  of  the 
leagues  was  the  Four-Legged  League,  in  which  teams  of  four  Aibos  played  on  a  field 
measuring  6m  by  4m. The  robots  operated  fully  autonomously.  So  there  was  no  exter¬ 
nal  control  either  by  people  or  by  computers. 

Consider  the  problem  of  designing  the  controller  for  a  Four-Legged  League  team 
member.  Clearly  each  robot  must  perceive  its  environment  and  then  decide  how  to  act. 
No  simple  controller  will  make  a  robot  competitive  with  a  human  player  at  either  task. 
But  a  simple  controller  may  provide  the  basis  for  a  first -generation  prototype.  Figure  P.ll 
shows  a  finite  state  machine  that  was  used  to  define  the  behavior  of  an  attacking  player 
for  the  Austin  Villa  team  0  >n  2003.  the  first  year  that  it  entered  the  Four-Legged  com¬ 
petition  (Stone  el  al.  2003]. 


ficilre  p.io 

A  Sony  Aibo  robot. 


1  'This  description  is  a  very  slightly  edited  version  of  the  description  on  the  Koh<>r  •,  ,  _ 

uP2»X)h  website  0. 
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FIGURE  P.ll  A  finite  slate  controller  for  an  Aibo  soccer  player. 

The  states  of  this  machine  correspond  to  simple  behaviors.  The  transitions  between 

the  states  depend  on  input  from  the  robot’s  perceptual  systems:  vision  and  localization, 

as  well  as  its  global  map  and  its  joint  angles.  The  states  can  be  described  as  follows: 

•  Head  Scan  for  BalkThis  is  the  first  of  a  few  states  designed  to  find  the  ball.  While  in 
this  state,  the  robot  stands  in  place,  scanning  the  field  with  its  head. 

•  Turning  For  Ball:  The  robot  is  turning  in  place  with  the  head  in  a  fixed  position 
(pointing  ahead  but  tilted  down  slightly). 

•  Walking  To  Unseen  Ball:  The  robot  cannot  see  the  ball  itself  but  one  of  its  team¬ 
mates  communicates  to  it  the  ball’s  location.  Then  the  robot  tries  to  walk  toward 
the  ball.  At  the  same  lime,  it  scans  with  its  head  to  try  to  find  the  ball. 

•  Walking  to  Seen  Ball: The  robot  can  see  the  ball  and  is  walking  toward  it.  The  robot 
keeps  its  head  pointed  toward  the  ball  and  walks  in  the  direction  in  which  its  head 
is  pointing.  As  the  robot  approaches  the  ball,  it  captures  the  ball  by  lowering  its 
head  right  before  making  the  transition  to  the  Chin  Pinch  Turn  state. 

•  Chin  Pinch  TUrn:  The  robot  pinches  the  ball  between  its  chin  and  the  ground.  It 
then  turns  with  the  ball  to  face  in  the  direction  in  which  it  is  trying  to  kick. 

•  Kicking: The  robot  is  kicking  the  ball. 


•"These  descriptions  arc  slightly  edited  versions  of  the  ones  that  appeared  in  [Stone  et  al.  2003). 


1064  Appendix  P  Applications:  Using  Finite  State  Machines  and  Transducers 


•  Recover  From  Kick:  The  robot  updates  its  knowledge  of  where  the  ball  is  and 
branches  to  another  slate.  Which  stale  comes  next  depends  on  the  kick  that  was  just 
performed. 

•  Slopped  To  See  Ball:  The  robot  is  looking  for  the  ball  and  thinks  it  has  seen  it.  But 
it  still  is  not  sure  it  has  seen  the  ball.  Possibly  the  vision  system  has  returned  a  false 
positive.  To  verify  that  the  ball  is  actually  there,  the  robot  momentarily  freezes  in 
place.  Once  it  has  seen  the  ball  for  enough  consecutive  frames,  it  can  lake  the  tran¬ 
sition  to  Walking  to  Seen  Ball.  If  it  fails  to  do  that,  it  returns  to  its  previous  state. 
Note  that  the  Slopped  To  See  Ball  stale  is  not  shown  in  the  diagram.  Instead  the 
label  "Ball  is  seen*’,  just  above  the  state  Walking  to  Seen  Ball,  is  a  shorthand  for  the 
actual  process  of  transitioning  into  Walking  to  Seen  Ball  from  the  three  states 
above  it.  If.  in  one  of  those  slates,  the  robot  believes  it  has  seen  the  ball,  it  enters  the 
state  Slopped  To  See  Ball. Then,  if  the  condition  for  believing  that  the  ball  has  actu¬ 
ally  been  seen  are  satisfied,  the  transition  continues  into  Walking  to  Seen  Ball. 

To  evaluate  the  conditions  on  the  transitions  of  this  FSM.  the  robot  controller 

exploits  the  following  Boolean-valued  functions: 

•  Ball  Lost  teturns  true  if  the  robot  is  reasonably  confident  that  the  ball  is  lost.  It  is  a 
sticky  version  of  what  the  vision  system  is  reporting.  So.  if  BallLost  is  true,  then  it 
will  become  false  only  if  the  vision  system  reports  seeing  the  ball  for  several  consec¬ 
utive  frames.  Similarly,  if  BallLost  is  false,  several  consecutive  frames  of  not  seeing 
the  ball  are  required  for  it  to  become  true. 

•  NearBall  is  used  when  the  robot  is  walking  toward  the  ball.  It  returns  true  when  the 
robot  is  close  enough  to  the  ball  to  be  able  to  begin  capturing  the  ball  with  a  chin 
pinch  motion. 

•  Determine AndSetKick  is  used  when  making  a  transition  out  of  WalkingToSccn- 
Ball.  It  determines  whether  or  not  a  chin  pinch  turn  is  necessary.  It  also  computes 
the  angle  through  which  the  robot  should  turn  before  it  kicks,  as  well  as  which  kick 
should  be  executed. 


A 
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/applications:  Using  Grammars 


The  Oxford  English  Dictionary  (2nd  Edition.  1989)  gives,  as  its  fifth  definition  of  the 
word  “grammar”,  the  following: 

a.  The  fundamental  principles  or  rules  of  an  art  or  science,  h.  A  book  presenting 
these  in  methodical  form.  (Now  rare;  formerly  common  in  the  titles  of  books.) 

It  goes  on  to  mention  the  following  examples: 

•  E.  Newman's  book.  The  Grammar  of  Entomology,  1856. 

•  Owen  Jones's  book.  Grammar  of  Ornament,  1 870. 

•  W.  Sharp,  in  Rosetti.  1882.  said.  “The  young  poet  may  be  said  to  have  reached  the 
platform  of  literary  maturity  while  he  was  yet  learning  the  grammar  of  painting.” 

•  An  article  in  The  Listener,  18  September,  1958,  said, "Reizenslein's  dissonances  do 
not  make  one  *sit  up'  in  the  way  Haydn's  do  if  we  attend  to  his  musical  grammar.” 

■  An  article  in  The  Times,  5  March.  1963,  said.  “The  grammar  of  the  film  was 
established.” 

We  have  been  using  a  more  restricted,  technical  definition  of  the  term.  But  its  wider 
use  is  not  disconnected  from  our  narrower  one. Grammars,  as  we  have  defined  them,  can 
be  used  to  describe  a  wide  variety  of  phenomena.  We  have  already  seen  that  context-free 
grammars  can  be  used  to  describe  some  or  all  of  the  structure  of: 

•  Artificial  languages  that  have  been  designed  to  facilitate  people's  interaction  with 
computers.  For  example,  we've  mentioned  the  programming  languages  Algol.  Lisp, 
and  Java. 

•  Naturally  occurring  phenomena.  For  example,  we  have  considered  written/spoken 
languages  such  as  English  and  Chinese,  as  well  as  other  natural  symbol  systems 
such  as  DNA  and  protein  sequences. 

In  this  appendix,  we  will  consider  other  uses  for  context-free  grammars.  We  will  also 
mention  the  use  of  one  other  formalism,  the  Lindenmayer  (or  simply  L)  system. 
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Q.l  Describing  Artificial  Languages  Designed 
for  Person/Machine  Interaction 

Imagine  that  you  need  a  language  that  can  support  person/machine  conimunica- 
lion  or  mnchinc/mnchinc  communication.  Consider  the  language  classes  that  we 
have  discussed.  We  can  rale  them  on  a  star  system,  as  shown  in  Figure  Q.l.  Regular 
languages  do  not  have  enough  expressive  power  for  many  applications.  For  exam¬ 
ple.  in  the  regular  framework  it  is  not  possible  it)  require  that  delimiters  be 
matched. The  decidable  languages. on  ihe  other  hand,  have  all  the  power  it  is  possi¬ 
ble  to  gel.  But  there  are  few  decision  procedures  and  the  ones  that  do  exist  are, 
quite  often,  unacceptably  slow.  The  context-free  framework  is  a  reasonable  com¬ 
promise  in  many  eases. 

Most  programming  languages  are  mostly  (except  for  type-checking)  context-free. 
We  discussed  them  in  appendix  t».  In  this  section  we  consider  other  common  kinds  of 
context-free  languages. 

Q.l .1  Query  Languages 

Query  languages  allow  users  to  write  logical  expressions  that  describe  objects  that  are 
to  be  retrieved.  For  example.  SQL  is  a  widely  used  query  language  for  relational  data¬ 
base  systems.  A  simple  SQL  query  is  the  following; 

SELECT  DISTINCTROW  A.x.  A.y,  A.z 

FROM  A  INNER  JOIN  B  ON  A.x  =  B.x 
WHERE  (((A.z)  =  "m”)  AND  C(B.w)  «"c”)) 

OR 

(((A.z)  =  “n")) 

There  exist  context-free  grammars  that  describe  the  syntax  of  the  various  dialects 
of  SQL  u.  Regular  expressions  (or  regular  grammars)  are  not  powerful  enough  be¬ 
cause,  among  other  things,  they  cannot  describe  the  language  of  Boolean  expressions. 
For  example  notice  the  use  of  balanced  parentheses  in  the  Boolean  expression  that 
occurs  in  the  WHERE  clause  of  the  query  that  we  just  wrote.  It  is,  on  the  other  hand, 
straightforward  to  write  a  context-free  grammar  for  such  expressions.  Using  the  same 
techniques  that  we  used  in  Example  1  LI1)  to  define  a  grammar  of  arithmetic  expres¬ 
sions.  we  saw.  in  Exercise  1 1.10.  that  we  cun  write  an  unambiguous  Boolean  expres¬ 
sion  grammar  that  forces  left  associativity  and  that  implements  conventional 
precedence  rules. 


Repressive  power 
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1.2  Markup  Languages:  HTML  and  XML 

Markup  languages  allow  users  to  annotate  documents  with  lags  that  identify  functional 
components  within  a  document.  By  defining  standard,  agreed-upon  lags,  markup  lan¬ 
guages  allow  multiple  users  and  application  programs  to  perform  operations  on  the 
same  document.  lor  example,  a  menu  document  might  contain  units  that  are  marked 
with  a  price  tag. Then  a  menu  formatting  and  printing  program  could  right  justify  each 
price  on  its  line.  And  a  restaurant-cataloguing  program  could  extract  the  prices  from 
the  restaurant’s  menu  and  use  them  to  assign  a  price  category  to  the  restaurant. 

The  Hypertext  Markup  Language  (HTML)  Q  powers  the  World  Wide  Web  by  pro¬ 
viding  n  standard  language  in  which  hypertext  documents  on  the  Web  can  be  described. 
An  H  TML  document  is  simply  a  text  string,  but  that  text  string  describes  a  set  of  struc¬ 
tural  elements,  each  of  which  is  delimited  by  a  starting  tag  and  its  matching  closing  tag. 
The  text  between  the  starting  and  dosing  lags  will  be  displayed  according  to  the  defini¬ 
tion  of  the  element  class  that  is  delimited  by  those  tags.  Since  elements  may  be  nested 
within  other  elements.  HTML  is  not  regular  (for  the  same  reason  that  the  language  of 
balanced  parentheses  isn’t).  It  is  context-free  and  can  be  described  with  a  context  free 
grammar.  Each  element  definition  defines  a  new  kind  of  delimiter  (a  matched  pair  of 
tags).  A  syntactically  valid  HTML  text  must  nest  the  delimiters  correctly. 


EXAMPLE  Q.1  A  Grammar  for  a  Fragment  of  HTML 

Consider  the  following  syntactically  legal  fragment  of  HTML.  This  fragment  con¬ 
tains  a  ul  or  unordered  list  (generally  displayed  with  bullets  before  each  item). 
The  list  contains  two  items  (each  marked  as  an  1  i .  for  list  item),  the  first  of  which 
contains  a  nested  unordered  list: 

<u1> 

<li>Item  1,  which  will  include  a  sublist</li> 

<ul> 

<li>First  item  in  sublist</li> 
<li>Second  item  in  sublist</li> 

</ul> 

<li>Item  2</li> 

</ul> 

This  fragment  could  have  been  generated  by  the  following  context-free  gram¬ 
mar  (which  ignores  many  details,  including  the  fact  that  an  li  can  occur  only  in¬ 
side  a  list): 

HTML  text  -  Element  HTMLtext  \  .  /.  Text  is  a  sequence 

ci  x,  e.  .....  of  elements. 

e  I-  I  LI  |  « . .  (and  other  kinds  of  elements  that 

are  allowed  in  the  body  of  an 
HTML  document) 
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EXAMPLE  M.l  ( Continued ) 

UL  —  <ul>  HTML  text  </ul>  /*  The  <ul>  and  </ul> 

tags  must  match. 

LI  —  <li>  HTML  text  </li>  /*  The  <li>  and  </1i> 

tags  must  match. 


In  HTML,  the  set  of  legal  tags  (e.g..ul./ul.li.  and  /li )  is  fixed  and  determined  by 
the  language  designers.  So  it  can  be  built  into  a  grammar  such  as  the  one  we  just  de¬ 
scribed.  But  the  idea  of  annotating  text  with  structural  lags  is  useful  in  all  kinds  of  con¬ 
texts.  not  just  the  display  of  text  on  the  Web.  To  exploit  this  idea,  it's  necessary  to  allow 
users  to  define  new  lags  to  suit  their  needs. 

The  Extensible  Markup  Language  (XML)  °  does  exactly  that.  Users  write  defini¬ 
tions  of  new  document  types  by  specifying  the  set  of  legal  elements  and  the  tags  that 
delimit  them.  Those  elements  can  then  be  processed  by  application  programs.  So  some 
tags  may  be  used  to  indicate  how  an  element  should  be  displayed  (as  in  HTML).  But 
others  could  be  used  to  define  fields  in  a  database,  to  provide  a  basis  for  sorting  the 
elements,  and  so  on.  In  1.3.2.  we  mention  RDF.  a  language  for  annotating  Web  re¬ 
sources  so  that  they  can  function  as  part  of  the  Semantic  Web.  One  way  to  write  RDF 
expressions  is  in  XML. 

There  are  a  couple  of  formalisms  that  can  be  used  to  define  new  kinds  of  XML  docu¬ 
ments.  We’ll  briefly  consider  document  type  definitions  (DTDs).  Each  DTD  effectively 
extends  the  grammar  of  the  base  system  to  include  problem-specific  elements  and  tags. 


EXAMPLE  Q.2  Writing  a  Document  Type  Definition 

Suppose  that  we  want  to  define  a  document  type  that  will  be  used  for  homework 
assignments  handed  out  to  a  class.  Each  such  document  may  contain  some  or  all  of 
the  following  fields:  sequence  number,  title,  due  dale,  body.  We  can  describe  this 
class  of  documents  with  the  following  DTD: 

<!D0CTYPE  homework  [ 

<!  ELEMENT  seq  (#PCDATA)>  /*  # PCDATA  (parsed  character 
<!  ELEMENT  title  (#PCDATA)>  data)  is  a  built-in  type. 

<! ELEMENT  due  (#PCDATA)> 

<1  ELEMENT  body  (#PCDATA)>  ]> 

So  a  homework  document  may  be  composed  or  four  kinds  of  elements.  Each 
element  will  be  delimited  with  one  of  the  four  lags.  seq.  title,  due.  and  body, 
that  we  just  defined.  An  example  of  a  homework  document  that  is  consistent  with 
this  definition  is: 

<homework> 

<seq>2</seq> 
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<ti tl e>Regul ar  Expressi ons</ti tl e> 

<due>Fri day</due> 

<body>l.  Write  a  regular  expression  for  the  language  of 
strings  of  a’s  and  b’s  that  start  with  a. 

2.  Write  a  regular  expression  for  the  language  of 
strings  of  a's,  b’s,  and  c’s  with  at  most  one  a.</body> 
</homework> 


The  advantage  of  a  structured  document  such  as  this,  over  a  more  standard,  straight 
text  document,  is  that  someone  else  who  also  has  access  to  the  DTD  can  easily  skim 
piles  of  documents  and  extract  specific  pieces  of  information,  say.  for  example,  the  titles. 

Of  course,  real  documents  are  more  complicated.  There  are  typically  elements  that 
occur  inside  other  elements,  elements  that  must  occur  at  least  once,  elements  that  may 
occur  only  once,  optional  elements,  elements  that  may  occur  only  if  some  other  ele¬ 
ment  also  occurs,  and  so  forth.  So  the  language  in  which  DTDs  are  written  functions 
very  much  like  Extended  BNF  (EBNF), described  in  G.l.l.The  DTD  specification  lan¬ 
guage,  like  EBNF,  augments  the  standard  context-free  grammar  formalism  with  regu¬ 
lar  expressions  for  describing  regular  fragments  of  the  target  language. 

The  next  example  extends  the  homework  document  type  and  illustrates  the  use  of 
the  regular  expression  operators  concatenation  (represented  by  a  comma),  union  (rep¬ 
resented  as  | ),  Kleene  star,  and  al-leasl-one  (represented  as  +). 


EXAMPLE  Q.3  A  More  Flexible  Document  Type  Definition 


In  this  version,  we  describe  the  structure  of  the  body  of  a  homework  assignment. 
It  is  made  up  ot  zero  or  more  problems.  Each  problem  specifies  the  number  of 
points  and  then  a  description  (the  comma  indicates  concatenation).  The  descrip¬ 
tion  may  either  be  an  arbitrary  siring  (in  the  case  of  a  single-part  problem),  or  a 
list  of  one  or  more  parts. 


dDOCTYPE  homework  [ 
<! ELEMENT  seq 
<! ELEMENT  title 
<! ELEMENT  due 
<! ELEMENT  body 

<! ELEMENT  problem 


(#PCDATA)> 

(# PCDATA) > 

C#PCDATA)> 

Cproblem*)>  /*  2er0  or  more  problems, 
(points, 

descri ption)>  /*  a  points  element 

followed  by  a 
description. 
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EXAMPLE  M.1  ( Continued ) 


<! ELEMENT  points  (#PCDATA)> 

<! ELEMENT  description  (#PCDATA  | 

multi part)>  /* 


<! ELEMENT  multipart  (part+)>  /* 
<! ELEMENT  part  C#PCDATA)> 


a  single-part  ques¬ 
tion  can  be  any  sort 
of  text.  Or  the 
question  may  be  mul¬ 
tipart. 

At  least  one  part. 

]> 


Although  the  DTD  syntax  is  different  from  the  one  we  have  been  using  for  gram¬ 
mar  rules,  it  provides  the  same  information  that  grammar  rules  do.  Each  element  dec¬ 
laration  in  a  DTD  effectively  augments  the  grammar  that  defines  the  strings  that  are 
legal  in  an  XML  document  that  has  been  written  to  conform  to  that  DTD. 


EXAMPLE  Q.4  Viewing  a  DTD  as  a  Grammar 


Consider  again  the  simple  DTD  of  Example  0.2: 


<!D0CTYPE  homework 
<! ELEMENT  seq 
<! ELEMENT  title 
<! ELEMENT  due 
<! ELEMENT  body 


(#PCDATA)> 
(#PCDATA)> 
(#PCDATA)> 
(#PCDATA)>  ]> 


That  DTD  defines  four  new  kinds  of  elements  by  effectively  adding  the  rules: 

Element  — ►  seq  I  title  |  due  \  body 

Each  element  must  be  delimited  by  the  appropriate  tags,  so  the  DTD  also  adds 
the  following  rules: 


seq— * *  <seq>\#PCDATA</sec\> 
title —  <ti  tl  e>\$PCDATA</t\  tl  e> 
due—*  <due>\#PCDA7*A</due> 
body—*  <body>\# PCDA TA</ body > 


Some  XML  parsers,  called  validating  parsers,  check  documents  to  make  sure  that 
they  contain  only  the  elements  that  are  defined  in  the  current  DTD.  But,  for  efficienev 
there  are  also  nonvalidating  parsers,  which  check  only  that  the  core  syntax  has  been 
followed  and  that  lags  are  properly  nested- 
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FIGURE  Q.2  A  picture  drawn 
with  a  graphics  metalanguage. 


XML  has  heen  used  to  define  specialized  markup  languages  for  hundreds,  and  prob¬ 
ably  even  thousands,  of  specialized  application  environments  B. 


q_1.3  Graphics  Metalanguages:  SVG 

Pictures,  like  the  one  shown  in  Figure  0.2,  do  not  look  like  strings.  So  sets  of  pictures 
do  not  appear  to  be  languages.  But  programs  that  draw  such  pictures  must  be  told  what 
to  do.  Strings  in  graphics  metalanguages  provide  such  instructions.  So  while  the  pic¬ 
tures  aren't  strings,  their  descriptions  are. 

Many  graphics  languages  exist  R,  some  as  stand-alone  languages  and  others  as  ex¬ 
tensions  of  more  general  programming  languages  and  environments.  With  the  advent 
of  the  World  Wide  Web,  came  the  need  for  standards  in  this  arena,  as  in  many  others. 

The  Scalable  Vector  Graphics  (or  SVG)  language  is  one  proposed  standard.  SVG  R 
is  a  language  for  describing  two-dimensional  graphics  (including  interactive  and  ani¬ 
mated  graphics)  in  XML.  The  following  SVG  program  drew  the  figure  shown  above: 

<svg  wi dth=“100%"  height=u100%"  version*"l.l" 
xml ns=“http : //www . w3 . org/2000/svg"> 

<ellipse  cx="240"  cy^‘100"  rx^‘200"  ry=“25" 
styl  e=“fi  11  :grey ;  stroke :  bl  ack’7> 

<ellipse  cx=“220"  cy-“70"  rx-'^OO"  ry='*40” 
styl e="fi 11 : whi te ; stroke : bl ack"/> 

</svg> 

SVG  can  be  described  with  a  context-free  grammar. 

q  2  Describing  Naturally  Occurring  Phenomena 

Muny  kinds  ol  naturally  occurring  phenomena  can  usefully  be  described  using  gram¬ 
mars  of  various  sorts.  Wc  sketch  two  of  them  here.  We  should  point  out  before  we  start, 
ihoug  1. 1  at  l  ore  many  kinds  of  naturally  occurring  phenomena  that  cannot  easily  be 
described  within  the  context-free  framework.  In  N.1.2  we  briefly  mention  some  of  the 
reasons  that  they  are  rarely  used  in  music,  for  example. 


a 


2.1  Dance 

Pnliunees  llvit  'Ire  com  S'f°n  ^  sJ.ace  and  in  time.  Context-free  grammars  define 
,  HrSc '  he “  X  “  one-dimensional  s, rings.  But  such  strings  can  be  usee 
1  c  cnstlcs  ntany  kinds  of  dances  if  we  start  by  defining 
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a  sol  of  primitives  (which  we'll  then  use  as  the  alphabet  for  our  grammar)  that  corre¬ 
spond  to  the  basic  moves  that  a  dancer  might  perform. 


EXAMPLE  Q.5  A  Grammar  of  the  Foxtrot 

When  dancing  a  foxtrot,  one  can  take  either  slow  (S)  steps  (that  take  two  beats)  or 
quick  (0)  steps  (that  take  a  single  beat).  Each  step  may  start  with  either  the  left 
(L)  or  the  right  (R)  foot.  And  each  step  may  move  forward  (F).  sideways  (W), or 
backwards  (B).  or  it  may  close  (C),  i.e..  bring  the  feet  together.  These  basic  sym¬ 
bols  can  be  combined  to  form  an  alphabet  that  corresponds  to  the  individual  steps 
that  can  make  up  a  dance. 

One  popular  foxtrot  form  is  called  the  box  rhythm.  The  following  grammar 
rule  (slightly  adapted  from  the  larger  grammar  described  in  [Herbison-Evans 
2006]  U  )  describes  it  (for  the  man): 

B  —  LS  ROW  LQC  RS  LOW  ROC 

The  dancer  takes  six  steps:  slow,  quick,  quick,  slow,  quick  quick.  The  man  al¬ 
ways  starts  with  his  left  foot,  then  alternates  feet. 


Q.2.2  The  Development  of  Plants 

In  Section  24.4.  when  we  introduced  Lindenmaver  systems  ( typically  allied  just  L-syslems), 
we  pointed  out  that  they  were  created  by  the  biologist  Aristid  Lindenmaver  as  part  of  his 
work  on  plant  development  and  growth. ihere  we  showed  a  simple  L-system  that  described 
highly  stylized  tree  structures.  But  much  more  realistic  structures  can  also  be  produced  Ot. 
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convertloGreihucIt.  32 1 . 842. 846 
converUononteriniiml.  533 
C  onway.  John.  4 1 4, 42 1 . 582 
Cook.  Stephen.  649. 743 
Cook-Levin  llicorcm. 649. 650. 743. 
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LT.  Sec  Logic  Theorist 
Lucas.  Edouard  Anatolc,  1058 

machine  learning,  897 
jtfacricfcum,  1030 
jviahalanobis.  P.  C.,  742 
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for  first-order  logic.  821 
for  pushdown  automata.  266. 302 
847 

Greihach.  234. 309. 3 1 7. 3 1 K  333 
361.839 
nonterminal.  533 
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parse  table,  339, 341. 349, 350 
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parser,  218 
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context-free.  218. 260, 300, 323, 61 8 

deterministic,  297. 336, 349 
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shift-reduce,  262.345 
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partial  recursive  function,  566 
partially  computable  function,  379, 
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space  algorithm,  700 
space  complexity,  700 
time  algorithm.  604, 621 
time  complexity.  604, 621 
time  reduction.  644, 704 
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proof  in  SD,  491 
proof  not  in  D,  525, 583, 864 
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primary  structure  of  a  protein,  963 
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779.956 
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PRIMES.  23. 38, 591,632. 643, 690, 
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primitive  recursive  function,  566, 
583 
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prior  art,  473 

probabilistic  algorithm  See 

randomized  algorithm 
probabilistic  context-free  grammar, 
243,976,997,998 
probabilistic  finite  automaton,  101 
production  system.  203, 416,418, 
880, 1022 
profile  HMM.973 
program.  792 
program  synthesis,  897 
program  verification,  23. 116, 473, 
4%,  792, 898. 899 
programming  language,  5, 29, 30, 
201.222,230.288,323. 
351,415,880 
Ada,  See  Ada 

ALGOL  60  See  ALGOL  60 
Algol  68,  See  Algol  68 
C.  See  C 
C-H-,  See  C++ 

Cobol.See  Cobol 
Fortran.  See  Fortran 
Haskell,  See  Haskell 
IPL,  See  1PL  languages 
Java,  See  Java 
Lisp.  See  Lisp 
ML,  See  ML 
Modula-2,  See  Modula-2 
Perl,  See  Perl 
Prolog,  See  Prolog 
Python,  See  Python 
Scheme,  See  Scheme 
Prolog,  205, 41 9, 5 16, 1015, 1034 
proof 
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proper  prefix.  10 
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Ethernet.  919 
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HTTP.  919 
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punched  card.  1057 
pushdown  automaton.  29. 249. 361 
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deterministic.  254. 295 
nondeterministic.  47, 25 1 . 254 
Putnam.  Hilary.  1009 
Python.  972. 1050 
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quantum  computing.  598, 633 
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query  language.  18. 30. 201 . 323, 
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queue  plus  FSM.  277.414, 416 
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Rado.  Tiber.  583 
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Raphael.  Bertram.  744 
RBS.  See  rule-based  system 
RDF  5. 932 
RDF  Schema.  941 
RDF/XML.  937 
RDFS.Str  RDF  Schema 
RE  language.  See  recursively 

enumerable  language 
Recursion  Theorem.  475, 573 
recursive  function,  566 
recursive  function  theory.  565 
recursive  grammar  rule,  209. 332 
recursive  language.  376. 566 
recursive-descent  parser.  338 
recursively  enumerable  language. 
199.378. 566 

reduce-rcducc  conflict,  348 
reduction.  27. 433, 449. 862 
for  complexity.  644. 704 
for  undecidahility.  452, 494. 498, 
500.522,864.889,951  ’ 
for  unscmidccidabiliiy.476 
logarithmic-space,  716 
mapping.  455. 466. 485. 644 
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